STATISTICS 


“FOUMDED ax sorrepar LO. CARVER, 1000-1938 


‘and Statistical Decision ions. R. Rs Baitapva. e er a 
 Approstmation wanes KL. Cirtne:. a 


ms op Quadratic F lied in the St of ‘haaivsis of 
bblems. 11. = S of V Si ant yore a 
en Errors in the Pye Way cation. G. Fh. BP Box ° 4484. 9. 
Rephialibs Pardimetere.. .H. B. DaNiELA — ‘an 
pe ESiaowiy of Certain Nonperanaag? Two ce Tests. al aM: 
, . a - ‘o 
‘€ertain Confidenee Contours for Distribution Funetions. see Mas 
ee ADIT... is oe aur 
deidoned Bands for Dares Der Pau G, Hoe. “4 
Hapdon Foes igns Satiat n Linear Relationa, 8. . ‘Geuiere. $a. 
ted Life Tos ests int » Exponent ial Case. Baniawmn Erste’). 5 ee” 
Distribution of a Studentized Order wer 4 K. ©, A. Pinar ann K. y, 
- RAMACHANDRAN. . 


On the Distribution of the Likelihood Ratio. 


The Ms, of the ximum* Likelihood: Ratimatein xf Teste for Goodness ote 
, Fit ‘Cuprnorr ann E. b, LEMMANB iis is ei es 
ae 


tion of the n Deviation —e 
ple Cun P Fungtions. grate. 
m-Free “Btatiatica." ZW. Bimeosca an FE Ros, ths 8 
L y Balanced Designs. Maan Rous pees” oe, a om 


wh 
x 


Bet weer a Dinerote sand CatiunVaale Point Bisesal 
! . Tat - pirate 
4 Computing Formals fér the Power, ot the Analysis of Voriasenitel 


Soy gy Eo Re aayatt,. COT 
Power Unider Normality of che Nonpatipiblrie Testa. W.J’ Dison. 610 
A Ramat on the Juint Distribution oh Cumulative Sums, Sums, He ¥ 

OBBINS 


i Dis hy of Paperg. 

News ‘and Notices. ». ‘ ; 
port of the Pasadena Mecting bE gEGE and att Dabs, . Sree .<. Gath Sid's ses 
wiicasions Receive bots Yi gee en hee a, Seay} aire k ips 


a 
— 


Realy 
oe 





> 





SUFFICIENCY AND STATISTICAL DECISION FUNCTIONS 
By R. R. Banapur 
Columbia University 


Summary. This paper contains an account, in abstract terms, of sufficiency 
and of its role in statistical decision problems. The study of sufficiency in ab- 
stract terms was initiated by Halmos and Savage [1], and the present paper, 
although self-contained, is to be regarded as a continuation of their work. The 
main objects of the paper are to show that the justification for the use of suffi- 
cient statistics in statistical methodology which is sketched in the final section 
of [1] is valid under certain quite general conditions, and to extend this justifica- 
tion to the case of sequential experiments. The paper falls into two parts of 
which the first (Sections 2-7) is mainly expository and provides an account of 
the theory of sufficiency in the nonsequential case. The second part (Sections 
8-11) then extends the theory to sequential experiments. 


1. Introduction. In a given experimental program, let X be the sample space of 
all possible outcomes zx, and suppose that z is distributed in X according to an 
unknown one of a certain set P of probability measures p. Let 7’ be a function 
of x, and let Y be the set of all values of 7. The function 7 is said to be a suf- 
ficient statistic if, for each subset A of X and for each y in Y, the conditional 
probability of A given T(x) = y is the same for every p in P. 

It is well known that, in most applications, P is a dominated set of measures, 
that is, there exists a measure \ such that each p in P admits a probability den- 
sity function with respect to \. In this case, a statistic 7' is sufficient if and only 
if each of the probability density functions can be written as the product of two 
factors, the first factor being the same for each density and the second depend- 
ing on x only through 7’, say 


p(A) = h(x) +g, {T'(x)\ dd 
JA 


for all sets A and each p in P (Corollary 6.1). 

In a statistical decision problem, let D be the set of all decisions from which 
the statistician is required to select some one decision, on the basis of the ob- 
served outcome. This set D is called the decision space. A (possibly randomized) 
function of x which takes value in D is called a decision function based on x. 
If 7' is a sufficient statistic, then, corresponding to any decision function y based 
on x, there exists a decision function » based on y such that, for each p, the values 
of w and » are identically distributed. Consequently, in his search for a “good”’ 
decision function, the statistician may confine his attention to decision functions 
based on y, that is, to decision functions whose values depend on the outcome 


Received 5/5/52, revised 6/28/54. 





424 R. R. BAHADUR 


only through 7’. It is shown that this reduction of the decision problem by means 
of a sufficient statistic is valid if D is, or may be taken to be, a subset of a eu- 
clidean space (Theorem 7.1). 

The notion of euclidean space is not essential to the result just described. If 
D is a Borel set of a euclidean space, there exists a one-to-one mapping of D 
into the real line which takes the (Borel) measurable sets of D into Borel sets 
and conversely. The proof depends only on this last property, and, therefore, 
applies equally well to any measurable space D which possesses it. This remark 
applies with little or no modification to all euclidean space conditions stated 
in this section. 

It might be argued that if in the given case there does exist a one-to-one meas- 
urability-preserving mapping of D into the real line, one might as well take the 
real line to be the decision space. This is perfectly feasible in principle, but in 
many cases (such as (iii) below) the real line is an unnatural representation 
which obscures the problem itself, as well as the results obtainable by applica- 
tion of various general theorems. The analogous remark applies to the possi- 
bility of taking the sample space to be the real line if, say, X is a euclidean space. 

The following are some special cases of the general result. Let 7’ be a suffi- 
cient statistic. 

(i). Testing hypotheses. Let Po be a subset of P, and let Ho be the hypothesis 
that the unknown distribution p is an element of Py). For any procedure yu for 
testing Hy, let a,(u) be the probability, corresponding to p, of rejecting Ho . 
Regarded as a function of p, a,(u) is called the power function of yu. By letting 
the two decisions “accept Hy’ and “reject Hy’’ correspond (say) to the real 
numbers 0 and 1, respectively, it follows that, corresponding to any test pro- 
cedure based on x, there exists one based on y which has the same power func- 
tion (cf. [2j, p. 320). 

(ii). Point estimation. Let 6 = 6(p) be a real parameter. Then corresponding 
to any estimation procedure for @ based on 2, there exists one based on y such 
that, for each p, the two procedures yield identically distributed estimates. 

(iii). Confidence interval estimation. For any system yu of confidence intervals 
for 6, let I,(x) be the interval corresponding to z, let l,(x) be the length of J,(z), 
and let a,(«u) be the probability, corresponding to p, that /,(z) covers 6(p). 
Take D to be the set of all pairs (u, v) with ~~» <u < © and0 <0 < oa, 
and let the point (u, v) of D correspond to the decision that the unknown value 
of @ lies in the interval with center u and length v. It then follows from the gen- 
eral result that, corresponding to any system yu based on z, there exists a sys- 
tem v based on y such that, for each p, the lengths 1,(x) and l,(y) are identically 
distributed, and a,(u) = a,(v). 

(iv). “Information.” The classical contention concerning sufficiency, namely, 
that a sufficient statistic contains all the available information concerning the 
unknown actual distribution, can be interpreted as follows (cf. [1], pp. 239-241, 
and [2], p. 320). If the observed outcome in a given instance is x, and the statis- 
tician is supplied only with the observed value y = T(x) of the sufficient statistic 
T, he could, if he wished, calculate (with the aid of a random machine) a hypo- 





SUFFICIENCY 425 


thetical outcome 2x* in such a way that z* and z are identically distributed irre- 
spective of the actual distribution of zx. In other words, there exists a randomized 
function of y whose values are statistically indistinguishable from the outcomes 
x. Hence, knowledge of the observed value of 7’ is equivalent to knowledge of 
the observed outcome itself. By taking D = X and u(x) = zx in the general 
result, we see that a sufficient condition for the validity of this interpretation 
is that X be a subset of a euclidean space. It now follows that this last is an 
alternative sufficient condition for the validity of the sufficient statistic reduc- 
tion of a decision problem. 

Supposing that, in the given case, sufficient statistics do reduce the decision 
problem, it is of interest to determine a statistic, if any, which affords the maxi- 
mum reduction. This question is not to be confused with the equivalence of any 
two sufficient statistics, which follows from the equivalence of any sufficient 
statistic with the outcome itself. The problem here is to determine, if possible, 
a sufficient statistic 7* such that, for any sufficient statistic 7’, the class of 
decision functions based on T*(x) is included in the class of those based on 7'(z). 

A “strong” solution of this problem is available in the case when P is a sepa- 
rable metric space under the metric d(p, q) = sup, | p(A) — g(A)|. If X isa 
subset of a euclidean space, separability is equivalent to domination. In gen- 
eral, however, separability is a stronger condition than domination. (See [14] 
and the last paragraph of Section 6.) Lehmann and Scheffé [2] showed that in 
the above case there exists a sufficient statistic T* which is also necessary. That 
is, if 7’ is any sufficient statistic, then 7* is a function of 7’; clearly, T* affords 
the maximum reduction. 

An alternative (and possibly better) solution is obtained here for the case 
when D may be taken to be a subset of a euclidean space and P is a dominated 
set of measures. It turns out that in this case there exists a class C of decision 
functions such that C is equivalent, in the sense of the preceding paragraphs, 
to the class of all decision functions based on z, and such that, for any sufficient 
statistic 7’, this class C is included in the class of all decision functions based 
on T(x) (Theorems 6.2, 7.1, and Lemma 7.1). 

The reduction of a statistical decision problem to decision functions based on 
a sufficient statistic is, of course, only one of the reductions available to the 

athematical statistician. Others, which apply in contexts somewhat more spe- 
cific than the present one, are the reduction to non-randomized decision func- 
tions (cf. [3], also [4] and [5]), and the reduction to invariant decision functions 
(cf. [6] and [7]). Some interesting results (e.g., the theorem of Rao [8] and Black- 
well [9] concerning unbiased minimum variance estimation) can be obtained by 
combining the sufficiency reduction with one or both of the others mentioned. 
In many special cases successive application of the sufficiency, nonrandomiza- 
tion, and invariance reductions in this order solves the decision problem, that 
is to say, determines a decision function which is “‘best’”’ in the class of all deci- 
sion functions. These considerations are, however, outside the scope of this 
paper. 

Now consider the sequential case. Let x = (2, 22, -°-: ) be a sequence of 





426 R. R. BAHADUR 


chance variables, let X be the set of all possible sequences x, and suppose as 
before that x is distributed in X according to an unknown one of a certain set P 
of probability measures p. For each m let X,») denote the set of all truncated 
SEQUENCES Lim) = (2%, °** , Lm), and let 7',, be a function on X,,,). Then 7; , 
T,,-°- , is said to be a sufficient sequence if, for each m, 7, is a sufficient 
statistic for the possible distributions of 2; ,) . 

Let Da), De), +--+ be a sequence of (terminal) decision spaces. A sequential 
decision function consists of a sampling procedure and a terminal decision pro- 
cedure. A sampling procedure is a set of rules for taking observations 2 , 22, 
one by one on the components of 2. The number of components observed in a 
given instance is called the sample size and is denoted by n. In using a given 
procedure, n need not be specified in advance; at each stage the decision whether 
or not the sampling is to be continued may depend on the sample values avail- 
able at that stage. A terminal decision procedure is a set of rules for employing, 
when the sampling has terminated, the observed values x , 22, --- , 2, to select 
some one decision, called the terminal decision, from the given set D,») . In most 
applications, such as testing hypotheses concerning p or estimating parameters 
6 = O(p), one has Day = De) = --- = Dum = +--+ , but there are cases where 
the set of possible terminal decisions does depend on the stage at which sampling 
is terminated. 

Let uw and p* be sequential decision functions, and let n(x) and n*(x) be the 
sample sizes and d(x) and d*(x) the terminal decisions, according to » and y*, 
respectively, corresponding to the sequence x of outcomes. Then yu and yu* are 
said to be equivalent if (i) for each p, the sample sizes n(x) and n*(xz) are identi- 
cally distributed, and (ii) for each p and m, the conditional distribution of d(z) 
given n(x) m is identical with the conditional distribution of d*(x) given 
n*(x) m. 

Suppose first that the sampling operation is not under the control of the statis- 
tician, but that he is to be presented with a sample obtained according to some 
specified procedure and asked to select the terminal decision. In this case, two 
terminal decision procedures are said to be equivaient if the sequential decision 
functions obtained by combining them with the given sampling procedure are 
equivalent. If 7’; , T., +++ is a sufficient sequence, and each D:») may be taken 
to be a subset of a euclidean space, it is shown that corresponding to any ter- 
minal decision procedure there exists an equivalent terminal decision procedure 
which has the following structure: if in a given instanée the sampling terminates 
at the mth stage, the terminal decision depends only on the observed value of 
T, (m 1,2, ---) (Theorem 10.1). 

Suppose now that the statistician is free to choose the sampling procedure 


as well as the terminal decision procedure. In this case, the above result affords 


only a partial justification of sufficiency. A more complete justification is pro- 
vided by the following result. If 7; , T:, --- is a sufficient sequence, if each Dy, 
may be taken to be a subset of a euclidean space, and if the experimental frame 


work is regular in a certain sense, then, corresponding to any sequential decision 





SUFFICIENCY 127 


function, there exists an equivalent one which has the following structure: 
when the first m observations have been taken, the decision whether or not 
sampling is to be continued depends only on the observed value of 
Tm (m = 1, 2, +--+ ); and if in a given instance the sampling is terminated at 
the mth stage, the terminal decision depends only on the observed value of 7’, 
(m = 1, 2,--- ) (Theorem 10.3). The hypothesis of regularity is shown to be 
essential to this result (Example 9.6). An explicit characterization of regular 
frameworks is not obtained here. It is shown, however, that a sufficient con- 
dition for regularity is that 2, a, +--+: be a sequence of independent chance 
variables for each p, and the set of possible distributions of 2;,,., be dominated 
for each m (Theorem 11.5). 

By taking Don, = Xm for each m in the results described above, one can 
obtain certain interpretations of the statement that ‘In sequential experi- 
mentation, a sufficient sequence contains all the available information.’’ These 
interpretations are given in the initial paragraphs of Section 8, which form an 
alternative introduction to the main results in the sequential case, and may be 
read before Sections 2 through 7. 


2. Some definitions. Let X be a set of points x. A class S of subsets of X is 
a (Borel) field if S contains X, if A e S implies (X — A) « S, and if A; eS for 
t = 1, 2,... implies U.A, ¢ S. We shall have frequent occasion to consider 
simultaneously more than one field of subsets of the same set X; the defini- 
tions which follow take this situation into account. 

Let S be a field of subsets of XY. A set A CX is S-measurable if A ¢ S; a real- 
valued function f on X is S-measurable if for every real r the set {a: f(x) < r} is 
S-measurable. Henceforth, functions with unspecified ranges are understood to 
be real-valued. For any set A, the characteristic function of A is denoted by 
x4, that is, x.(z7) = 1 for xe A and = O for re (X — A). Clearly, a set A is 
S-measurable if and only if x,(2) is an S-measurable function. 

A measure on S is a nonnegative and countably additive function of the 
S-measurable sets. A measure m on S is o-finite (on S) if there exists a sequence 
A; , Ao, «++ of S-measurable sets such that m(A,;) < @ foreach iandU,A, = X; 
it is a finite measure if m(X) < «, and is a probability measure if m(X) = 1. 
A function f on X is S-m-integrable if f is an S-measurable function of x and 


[ sc) dm exists and is finite. A set A © X is S-m-null if A is S-measurable 


JX 
and m(A) (0. 


For each x ¢ X let r(x) be a statement concerning x. We write x(x) |S, m|] if 
there exists an S-m-null set N such that r(x) is true for each « ¢ (X — N). Thus 
the statements f(z) = g(x) |S, m| and 0 S f(x) S 1[S, m]| mean, respectively, 
that the sets {:if(x) # g(x)} and {x:f(r) < 0 or f(x) > 1} are subsets of S-m- 
null sets. 

A measure m on S is absolutely continuous with respect to another measure 
n on S if every S-n-null set is also S-m-null; we then write m < n. We write 





428 R. R. BAHADUR 


dm = f(x) dn if f is a nonnegative S-measurable function such that m(A) = 


[ 1@ dn for every A e S. The Radon-Nikodym theorem states that if n is 
A 


a o-finite measure, then m < n if and only if there exists an f such 
that dm = f(z) dn. 

Now let M be a set of measures on S. A set is S-M-null if it is S-measurable 
and of m-measure zero for each m ¢ M; a function is S-M-integrable if it is 
S-m-integrable for each m « M. The statement x(x) [S, M] means that there 
exists an S-M-null set N such that x(x) is true for each z e(X — N). The set M 
is said to be dominated if there exists a fixed o-finite measure \ such that each 
measure in M is absolutely continuous with respect to \; we then say that M 
is dominated by \ and write M < X. It is easy to see that domination by a 
o-finite measure is equivalent to domination by a finite or even a probability 
measure (cf. [1], p. 232). 

A field Sy of subsets of X such that S, € S, that is, such that every Sp- 
measurable set is also S-measurable, is said to be a subfield of S. It is pointed 
out in the following section that the relations between the total outcome and a 
statistic which are of interest to us can be studied conveniently in terms of 
certain corresponding relations between the basic field and a subfield of it. 
Meanwhile, we note several facts concerning Sp and S. 

A measure m on S is also a measure on S). An So-m-null set is S-m-null. 


An S»-m-integrable function f is S-m-integrable and (S.) | f(x) dm = (S) 
x 


/ f(x) dm, where, as the notation suggests, the left and right integrals are 
x 


taken over the measure spaces (X, Sp , m) and (X, S, m) respectively; in such a 


case | f(x) dm will usually denote the integral taken over (XY, S, m). 
x 


Let m and n be measures on S. If m = n on S, then m = non S). Ifm <n 
on S, then m <n on S). If n is o-finite on Sp, then n is o-finite on S. If a 
set M of measures on S is dominated on S, then M is dominated on S, . If M 
is complete ({2], p. 311) on S, then M is complete on S,. The converses of 
these five propositions are not true in general. 

Let S; and S, be subfields of S, and let M be a set of measures on S. We 
write S,; ¢ S. [S, M] if, corresponding to each set A ¢ S,, there exists a set 
B e S, such that the symmetric difference of the two sets, that is, the 
set (A n [X — B]) u ((X — A] n B), is S-M-null. Since the characteristic func- 
tion of the symmetric difference of A and B is | x4(x) — xa(z) |, it is easily seen 
that S,; ¢ S, [S, M] if and only if corresponding to each A ¢ S, there exists a 
B ¢ S; such that x4(x) = x(x) [S, M]. We write S,; = S,.[S, M] if both S; ¢ S, 
[S, M] and S, ¢ S, [|S, M). 

The statement S, © S. [S, M] means, of course, that (relative to the given 
set M) S, is essentially a subfield of S.. Here “essentially” refers to a rather 
weak nul! set condition. A stronger condition is that there exists a fixed S-M-null 





SUFFICIENCY 429 


set, say N, such that to each set A, ¢ S, there corresponds a set Ay ¢ S, such 
that A, — N = A, — N. There is also a weaker null set condition, namely that 
S; & S, [S, m] for each m e M. The condition S,; © S, [S, M] is, however, exactly 


the one which we require (cf. Lemma 7.1). 


3. Statistics and subfields. Let there be given a set X of points z, a field S 
of subsets of X, and a set P of probability measures on S. The framework X, S, 
P will remain fixed throughout the discussion. In a statistical context, X is the 
set of all possible outcomes of the experiment, and S is the class of all sets A 
such that the event ‘‘x ¢ A” has a well defined probability p(A), where p is some 
(unknown) one of the measures in P. In this context, (X, S) is called the sample 
space, and z is said to be distributed in (X, S) according to p. 

A statistic is a function (with arbitrary range) of x. Let y = 7(x) be a sta- 
tistic, and let Y be the range of 7. For any B © Y let 7”-'(B) = {a:T(zx) ¢ B}, 
and let T be the class of all sets B such that 7”'(B) is an S-measurable subset 
of X. It is easy to see that T is a field, and that the event ‘“‘y ¢ B” has a well 
defined probability, p(7~'(B)) = pT™’(B) say, if and only if B is a T-measurable 
set. Thus y is distributed in (Y, T) according to pT. Let Q be the set of all 
measures pT’ ' corresponding to p in P. 

DEFINITION 3.1. 7' is a sufficient statistic for P if corresponding to each S-meas- 


urable set A there exists a T-Q-integrable function g,(y) such that for all B e T 
and pe P 


/ dp = / gay) dpT™. 
AnT~!(B) B 


This definition is equivalent to the one given by Lehmann and Scheffé [2]. 
Now we shall consider an alternative approach to the concept of sufficiency. 
As an immediate consequence of Lemmas 1 and 3 of [1], and of the present 
definition of T, we have 

Lemma 3.1. Let g be a function on Y. Then g(y) is T-measurable if and only if 
gT(x) {= g{T(x)]} is an S-measurable function of x; also g(y) is T-Q-integrable 
if and only if gT (x) is S-P-integrable, in which case for each p € P, 


| gT (x) dp = [ g(y) dpT™. 
x Y 


The class S) [= 7° '(T)] of all sets 7~'(B), with B e T, is a subfield of S; we 
shall call it the subfield induced by the statistic T. By applying the first part of 
Lemma 3.1 to Lemma 2 of [1], one obtains the following useful result. 

LuemMa 3.2. Let f be an S-measurable function on X. A necessary and sufficient 
condition that f(x) be So-measurable is that there exist a function g on Y such that 
f(x) = gT(x). 

An important property of Sp is that for each p the measure spaces (X, So , p) 
and (Y, T, pT’) are isomorphic ({10], p. 167), the isomorphism being inde- 
pendent of p. Consequently, explicit consideration of the sample space (Y, 7) 





430 R. R. BAHADUR 


of the values y of 7’, and of the possible distributions Q = {pT ':p ¢ P} of y, 
is not essential to the study of 7. An equivalent procedure is to study the pos- 
sible distributions P of x in the reduced sample space (X, So). For example, the 
set Q of measures on T is dominated if and only if P is dominated on S) , and 
() is complete on T if and only if P is complete on Sp . 

The evident notational simplifications which result from studying a statistic 
in terms of the subfield induced by it suggest the possibility of taking a suf- 
ficient subfield rather than a sufficient statistic to be the basic concept in the 
formal exposition. We can (and in the sequel, shall) proceed as follows. Given 
X, S, and P, an arbitrary subfield S, of S is said to be sufficient for P if cor- 
responding to each S-measurable set A there exists an S)-P-integrable fune- 
tion g, such that 


“Agna 


dp | galx) dp for Ao € So, pe P. 
Ag 


In addition to notational simplicity, this alternative approach to sufficiency 
has a number of other technical advantages. 

(i). It entails no loss of generality; the definitions and results concerning an 
arbitrary subfield S) can be translated into corresponding definitions and re- 
sults concerning an arbitrary statistic 7 by supposing that S» is induced by 7 
and applying Lemmas 3.1 and 3.2. For example, a statistic T is sufficient for P 
if and only if the subfield induced by T is sufficient for P. On the other hand, it 
is not known whether every subfield is inducible by a statistic. 

{While this paper was in process of publication, answers to some of the questions 
raised here were obtained by several workers, including FE. L. Lehmann and the 
writer. This work is contained in two notes (entitled “Two comments on ‘Suffi- 
ciency and Statistical Decision Functions’ ”’ and “Statistics and Subfields’’) 
which are to appear soon.| 

(ii). It is easier to establish certain results for subfields than to establish 
the corresponding results for statistics. Moreover, assuming that in the given 
case the results in question are available for both statistics and subfields, the 
results for subfields are at least as useful as those for statistics. 

For example, it can be shown rather easily that if P is dominated, a subfield 
which is necessary and sufficient for P exists. Proof of the corresponding result 
for statistics is more complicated and requires the stronger assumption that P 
is separable (cf. Sec. 6). The advantage in question is due to the considerations 
that the class of sufficient subfields includes the class of subfields induced by 
sufficient statistics (see (i) above), and that certain relations between subfields 
are (at least apparently) weaker than the corresponding relations between 
statistics. 

The following is an illustration of this last consideration. Let 7; and 7, be 
statistics and let S; and S, be the subfields induced by them. It is easy to see 
that if there exists a function F on the range of 7. into that of 7; such that 
T(x) = F([T.(x)], then S; is a subfield of S, . It is not known whether the con- 
verse is true in general. 





SUFFICIENCY 431 


(iii). Our primary concern is not the properties of particular statistics but 
the reduction of statistical decision problems by the sufficiency principle. It is 
therefore desirable, if not logically necessary, to define sufficiency as directly as 
possible in terms of X, S, and P; the subfield definition is closer to this require- 
ment than the statistic definition. This reason for preferring the subfield defini- 
tion seems at least as compelling as the reasons why, in the theory of testing 
hypothesis, a “‘test” is described without reference to any statistic as a meas- 
urable subset of the sample space. It would be even more compelling if it should 
turn out that there can exist subfields which are necessary and sufficient (or 
even sufficient) but which are not induced by any statistic. 

(iv). Finally, the quite simple notation and conditional expectation machinery 
(cf. the following section) which we use for the study of subfields in the non- 
sequential case prove to be sufficient for the corresponding study in the se- 
quential case. The study of statistics in the sequential case requires a compli- 
cated and very cumbersome notation. 

For all these reasons the following sections are written mainly in terms of 
subfields. It should be observed that the definitions and results of Halmos and 
Savage [1] refer not to a statistic as defined here and in [2] but to the more 
flexible notion of a measurable transformation. The difference is the following. 
Let T be a statistic on X onto Y, let T be defined as before, and let 7, be any 
subfield of 7; then 7' is a measurable transformation of (X, S) into (Y, 7). 
Thus a statistic corresponds, in general, to more than one measurable trans- 
formation. L. J. Savage points out in this connection that there is a good non- 
mathematical reason for taking the class of measurable subsets of Y to be T 
rather than any smaller class T) : the latter procedure is inconsistent with the 
generally accepted view that a statistic is a mapping. For example, if T(x) = 
x then Y = X, and T = S, but if T is regarded as a transformation of (X, S) 
into (Y, S,) where S, contains only X and the empty set, it becomes equivalent 
to the statistic 74(z) = 0 (say). 

Since a measurable transformation 7’ of (X, S) into (Y, To) induces the sub- 
field T-'(T), and a subfield Sy is induced by the transformation J(z) = « of 
(X, S) into (X, So), the notion of a measurable transformation is completely 
equivalent to that of a subfield. The subfield notation is, however, simpler (cf. 
(iv) above) and has certain psychological advantages (cf. (iii) above). When 
first submitted for publication, this paper was written in terms of measurable 
transformations. It has since been rewritten in the subfield terminology at the 
suggestion of L. J. Savage and of a referee of the paper. 

We conclude this section with two heuristic interpretations of the notion of 
sufficient subfield. (i). “The given class of sets of interest is S, but if Sp is suf- 
ficient, the statistician could, without disadvantage, take the (generally much 
smaller) class So to be the class of all sets which are of interest to him.’ To 
make this more specific, corresponding to each fixed x, let E” be the event that 
the outcome lies in the common part of all S-measurable sets containing x, and 
let E> be the event that the outcome lies in the common part of all S»-meas- 
urable sets containing x. Then (ii) “‘If So is sufficient, a statistician who knows 





432 R. R. BAHADUR 


only which of the events FZ has occurred is as well off as one who knows which 
of the events E” has occurred.”” Now, in most sample spaces, E* is the event 
that the outcome be z. Also, if Sp is induced by an S-measurable statistic 7', then 
76 is the event that the observed value of 7’ be T(x). It follows from the last 
two statements that, in many cases, the specific interpretation (ii) coincides 
with the interpretation of “sufficient statistic’? described in the introduction. 


4. Conditional expectation. Let S) be a subfield of S. Consider a particular 
probability measure p on S, and let f(z) be an S-p-integrable function. It fol- 
lows from the Radon-Nikodym theorem for signed measures that there exists 
an S»-p-integrable function, g(x) say, such that for all Ay ¢ Sy 


| g(x) dp = [ f(x) dp, 
Ag Ao 


and that g is essentially unique in the sense that an So-p-integrable function g* 
satisfies the same relation if and only if g*(x) = g(x) [S, p] (see [10], p. 128). 
Since g* and g are Sp-measurable, the stated null set condition is trivially equiva- 
lent to g*(x) = g(x) [S., p|. For notational simplicity, we shall usually state 
null set conditions in such cases in terms of S-measurable sets. We write g(x) = 
E,(f{(a) | So). 

It is assumed henceforth that for any probability measure p on S, any 
S-p-integrable function f(x), and any subfield S, of S, the corresponding 
Y p( f(x) | So) is a definite (but unspecified) So-p-integrable function of x such that 
for all Ay ¢€ So, 


. 


(4.1) | Eg(S(x) | So) dp = | f(x) dp. 


This E,(f(x) | So) is called the conditional expectation function of f given So 
and p, and a particular value F,(f(x) | So) of this function is called the con- 
ditional expectation of f given S), x, and p. If f(z) = x4(x), we may replace 
“expectation of f” by “‘probability of A”’ in these terms concerning F,(f(x) | So). 

The fact that in general a conditional expectation function is uniquely deter- 
mined only up to a null set can lead to certain rather trivial but persistent nota- 
tional complications. Many propositions concerning a subfield Sp which are of 
interest to us can be stated in terms of the existence of conditional expectation 
functions which satisfy certain conditions for each fixed z when regarded as 
functions of f and p (see [1], p. 230). The complications referred to arise from an 
explicit consideration of the possibly different determinations which satisfy 
different or increasingly strong conditions of this type. The actual determina- 
tions which satisfy special conditions are, however, of little interest to the 
theory, and we can and shall avoid the difficulty by studying their existence 
and other properties in terms of the fixed determination {F,(f(x) | So)}. 

It can be seen from Lemmas 3.1 and 3.2 that the relation between conditional 
expectation with respect to a subfield, and the more familiar notion of conditional 
expectation given the value of a statistic, is the following. Let T be a statistic, 





SUFFICIENCY 433 


and let (Y, T) be the sample space of the values of T (Section 3). Let f(x) be an 
S-p-integrable function. Regarding T as a transformation of (X, S) into (Y, 7), 
let g(y) be the conditional expectation of f given T(x) = y and p ({10], p. 209); 
then E,(f(x) | T~'(T)) = gT(zx) [S, p]. In other words, if Sp is the subfield in- 
duced by a statistic 7’, then E,(f(x) | So), which depends on x only through 7, 
is the conditional expectation of f given that the value of T is T(x) and that the 
outcome is distributed according to p. This relation supports (cf. the final 
paragraph of the preceding section) the following intuitive description of con- 
ditional expectations with respect to S): for each x, E,(f(x) | So) is the con- 
ditional expected value of the random variable f given that the outcome lies 
in the common part of all Sp-measurable sets containing x and that the out- 
come is distributed according to p. 

Now we shall list some properties of conditional expectations which are 
required subsequently. Most of these properties are well known, and all are 
easy consequences of the defining relation (4.1). 

Lemma 4.1. Jf f is S-p-integrable, then 


[ E, (f(x) | So) dp = [ s@) dp. 


Lemma 4.2. Let f; and fz be S-p-integrable functions. If f\(x) S fo(x) [S, p), then 
E,(fi(a) | So) S Ep(fo(x) | So) (S, p]. If fi(x) = fe(x) [S, p], then E,(fi(x) | So) = 
E,(fo(x) | So) [S, pl. 

Lemma 4.3. If f is S-measurable and c S f(x) S d[S, p)with-—»~ <cSds oa, 
thence S E,(f(x) | So) S d[S, pl). 

Lemma 4.4. If fi, ++: , fm are S-p-integrable functions, and c,,-+-* , Cm are 
constants, then 


E, (= cx filzx) | s.) = > cE,(f:(x) | So) (S, pl. 


t=1 i=l 


Lemma 4.5. Let fi, fo, «++ be a sequence of S-p-integrable functions such that 
Sm(x) S fm4i(x) (S, p| for m = 1, 2,---+ , and sup» {fm(x)} is S-p-integrable. 
Then 


sup {E,(fm(x) | So)} = E,(sup {fn(x)} | So) [S, pl. 


Lemma 4.6. [f f(x) is So-p-integrable, then 
E,(h(x)-f(x) | So) = E,(h(x) | So) -f(x) [S, p] 
for every h such that h(x) and h(x)-f(x) are S-p-integrable; in particular, 
E, (f(x) | So) = f(x) [S, pl. 


Lemma 4.7. If f(x) is S-p-integrable and E,(h(x)- f(x) | So) = E,(h(x) | So)-f(x) 
[S, p] for every h such that h(x) and h(x) -f(x) are S-p-integrable, then f differs from 


an So-measurable function on an S-p-null set; in fact 


f(x) = E,(f(x) | So) [S, pl. 





434 R. R. BAHADUR 


Lemma 4.8. Let f be an S-p-integrable function, and let S, and S» be subfields 
of S. If S; | S, [S, p], then 


E,(f(x) | S:) = E,(E,(f(a) | S2) | S,) [S, p]. 
If S,; = S2[S, pl], then E,(f(x) | Si) = E,(f(x) | Ss) [S, pl. 


The proofs of these lemmas are omitted. 


5. Sufficiency. Necessity. 

Derinition 5.1. A field Sp © S is sufficient for the measures P on S (briefly, So 
is sufficient for P) if corresponding to each S-measurable set A, there exists an 
So-measurable function ¢4(z) such that 


(5.1) ga(x) Ey(xa(«) | So) [S, p] for each p in P. 


This definition is readily verified as equivalent to the one given (without explicit 
reference to conditional probabilities) in Section 3. The sufficiency of So is equiv- 
alent to the existence of a determination of the conditional probability func- 
tions with respect to Sy such that, for each « ¢e X and A e S, the conditional 
probability of A given Sy , x, and p is the same for each p in P. 

It is easy and instructive to investigate the sufficiency of the extremal sub- 
fields of S, namely S itself and the one which contains only X and the empty 
set, say Sy. By Lemma 4.6, for any S-measurable set A, we have x4(x) = 
Ey(xa(2) | S) [S, p| for each p in P, so that S is sufficient for P. The field S, 
is more interesting. A function g(x) is S*-measurable if and only if g takes a 
constant value, and hence for any S-measurable set A and any p, we have 
Ey(xa(x) | Se) = p(A) for all z, by Lemma 4.1. Consequently, S« is sufficient 
if and only if the set P of measures on S consists of only one measure. These 


facts concerning S and Sy are intuitively obvious, as may be seen by turning 
to the last paragraph of Section 3 and the fourth paragraph of Section 4. 

THroremM 5.1. When R is the real line, and R the class of Borel sets of R, the 
following statements are mutually equivalent. 

(i) So ts sufficient for P. 

(ii) Corresponding to each S-P-integrable function f(x), there exists an So-P- 
integrable function g(a) such that 


(5.2) g(x) E,(f(a) | So) (S, p| for each pe P. 

(iii) Corresponding to each function p(B, x) defined for Be R and xe X such 
that » is S-P-integrable for each B and a measure on R for each x, there exists a 
function v(B, x) such that v is So-P-integrable for each B and a measure on R for 
each x, and such that 


(5.3) v(B, x) = E,(u(B, x) | So) [S, p| for each Be Rand pe P. 


Proor. We shall show that (i) — (ii) — (iii) — (i). Suppose then that (i) 
holds. Let F be the class of all S-P-integrable functions f such that (5.2) is 


~- 


satisfied by some So-P-integrable g. By hypothesis, / contains all S-measurable 





SUFFICIENCY 435 


characteristic functions. An application of Lemma 4.4 shows that F contains 
all S-measurable simple functions ({10], p. 84). Since every nonnegative S-meas- 
urable function is the limit of a nondecreasing sequence of S-measurable simple 
functions, it follows by means of Lemma 4.5 that F contains all nonnegative 
S-P-integrable functions. Hence, by change of sign, F contains all nonpositive 
S-P-integrable functions. Since each S-P-integrable function is the sum of a 
nonnegative S-P-integrable function and a nonpositive one, it follows by means 
of Lemma 4.4 that J’ is the class of S-P-integrable functions. Thus (ii) holds. 

Now let there be given a » which is a measure on R for each x and an S-P- 
integrable function for each B. [The argument which follows is a straightforward 
generalization of J. L. Doob’s argument for the existence of a conditional proba- 
bility measure on the real line, that is, for the case when X = Rk, S = R, P 
contains only one measure p (so that every subfield is sufficient), u4(B, x) = x(x), 
and Sp» is the subfield induced by a measurable transformation. See [1.:], pp. 30 
and 623 and [10], p. 210. Exer. 5.] For any r, —*« <r < «@, let the open in- 
terval (— , r) be denoted by J,, and define f(r, x) = w(,, x). Let K be an 
enumerable everywhere dense subset (e.g., the set of rational points) of R. It 
follows from (ii) that corresponding to each k in K there exists an Sy-P-integrable 
function, g*(k, x) say, such that for each p ¢ P 


(5.4) g*(k, x) = E,(f(k, x) | So) [S, p]. 

Let ki, ke, --- be an enumeration of K, and for any 7 and j, let a(i, j) = 
min {k;, k;} and b(i, 7) = max {k,;, k;}. Since a(i, 7) S b(i, j), we have 
f(a(i, j), x) S f(b, 7), x) for all x. It follows easily from (5.4) by means of 
Lemma 4.2 that 


g*(a(i, j), 2) S g*(b(i, J), x) [So, PI, Hj=l,2 


» 
Hence g*(k, x) is a nondecreasing function of k [Sp , P}. 

For each m = 1,2, ---+ letun = min {k,, ko, --- , kn}. Then uw = mw 2 --- 
and lim wu, = — @. Since u(R, x) is S-P-integrable, u(PR, x) < «[S, P|. Hence 
({10], p. 179) 


f(m,x) 2 flm,z) 2 -:-, lim f(u, , 2) = O[S, P}. 


Let a*(x) = inf, |g*(u,,, x)}. It follows from (5.4) by application of Lemmas 
1.5 and 4.3 that a*(xz) = O[S), P|. Let a(x) = inf,,{g*(k,, , x)}. The conclusion 
f the preceding paragraph implies that a(x) = a*(x) [S), P]. Hence a(x) = 


0 [So ’ ri. 


Now let v», = max {hk , ko, +++ , km} and write A(x) supm {g*(km, x)} and 
B*(x) = sup» {g*(v, , x)}. Arguments similar to the ones used above show that 
8* is S,-P-integrable, and that 8*(r) = B(x) [S), P|. Hence B(x) < [Sy, P}. 

The preceding three paragraphs show that there exists an S»-P-null set, N 
say, such that for 2 e (X¥ — N), g*(k, x) is a nondecreasing function of k with 
a(x) = 0, B(x) < &. Define g(k, x) = g*(k, x) for x e (X — N) and = 0 for 
xeN. Then for each x, g is a nondecreasing function of k with inf, g = 0 and 





436 R. R. BAHADUR 


sup. g < , and for each k, g is an So-measurable function of x. Also g(k, x) = 
g*(k, x) for all k e K [S, P|, so that for each k e K and pe P 


(5.5) glk, x) = E,(f(k, x) | So) [S, p] 


by (5.4) and Lemma 4.2. Corresponding to each point r in R, let c:(r), c2(r), --- 
be a fixed strictly increasing sequence of points in K such that lim c,,(r) = r, 
and define h(r, x) = lim g(c,,(r), z). For each z, h is a nondecreasing, left-con- 
tinuous function of r with inf,h = 0 and sup,h < ~«. Moreover, we have 
fla(r), z) S fle(r), x) S --- , with lim f(c,,(r), x) = f(r, x) for each xz. Hence, 
for each r, and for each p e P, 


(5.6) h(r, x) = E,(f(r, x) | So) [S, p] 


by (5.5), the definition of h, and Lemma 4.5. 

For each z, let v(B, x) be the finite measure on R such that »(J,, x) = h(r, x) 
for all r({10], p. 179). Since h is an Sp-measurable function of x for each r, it can 
be shown (ef. [11], p. 364) that » is Sp-measurable for each B. Let pe P and 
Ao € So be fixed and define 


m(B) = / u(B, xz) dp, n(B) = / v(B, x) dp. 
Ao Ao 
Then m and n are measures on R with m(R) < , and m(/,) = n(I,) for all r, 


by (5.6) and (4.1). Hence ({10], p. 179), m(B) = n(B) for all Be R. Since p 
and Apo are arbitrary, we conclude that 


| u(B, xz) dp = | v(B, x) dp AgeS, BeR, peP. 
Ao Ao 


Since v is So-measurable for each B, it follows from (5.7) by the uniqueness 
assertion of the Radon-Nikodym theorem that (5.3) is satisfied. It is evident 
that v is S)-P-integrable for each B. Since yu is arbitrary, it follows that (iii) 
holds. 

Now let a and b be arbitrary but fixed points of R with a # b. Let A be an 
S-measurable set, and define 


u(B, x) = a(B)xa(xz) + B(B)(L — xa(2)), 


‘lifaeB, 
\0 otherwise ; 


( ° 

j\lifbeB, 
=z 4 : 

B(B) \0 otherwise. 


a(B) = « 


Then yu is a probability measure on R for each x and an S-measurable function 
for each B. Hence by (iii) there exists a v such that v is So-measurable for each 
B and (5.3) holds. It follows by taking B = {a} in (5.3) that (5.1) is satisfied 
by ga(x) = v({a}, x). Since in this argument A e¢ S is arbitrary, (i) holds. This 
completes the proof of Theorem 5.1. 

Remark. Let Ry be a Borel set of R such that Ry contains at least two points, 
and let Ry be the class of Borel sets of Ry . Let (iii)* denote statement (iii) with 





SUFFICIENCY 437 


R replaced by Ry and “measure” by “finite measure,” and let (iii)** denote 
statement (iii)* with ‘finite measure” replaced by “‘probability measure.”’ Then 
each of the statements (iii)* and (iii)** is also equivalent to (i). This follows 
immediately from Theorem 5.1 and from its proof. 

Coro uary 5.1. If So is sufficient for the measures P on S, and Soo is sufficient 
for the measures P on So, then Soo is sufficient for the measures P on S. 

This intuitively obvious result is an easy consequence of Lemmas 4.2 and 
4.8, and the equivalence of statements (i) and (ii) of Theorem 5.1; the formal 
proof is omitted. 

Derinirtion 5.2. A field S* © S is said to be necessary for the measures P on S 
(briefly, S* is necessary for P) if S* € S) [S, P] for each field So € S which is 
sufficient for P. 

The subfield Sy , consisting of only X and the empty set, is evidently necessary 
for P. A less trivial result is the following: If P is the set of all probability meas- 
ures on S (more generally, if P is complete on S), then S itself is necessary for P. 
That the converse is not true in general is shown by the following example. 
Let X consist of the three points 0, 1, and 2, let S be the class of all subsets of X, 
and let P consist of the two measures p and q, where p({0}) = p({1}) = 4 = 
q({l}) = q({2}) and p({2}) = 0 = q({0}). Then S is necessary for P, but P 
is not even complete on S. 

The study of necessity alone is, however, of little interest to us, and we shall 
be concerned mainly with subfields which are necessary and also sufficient for P. 
The special role of such subfields in the reduction of statistical decision problems 
is described in Section 7. 


6. The dominated case. It is assumed in this section that P is dominated on S. 
Let Po = {pi, pe, -*:} be a countable subset of P such that every S-Po-null 
set is also S-P-null. The existence of such a P» is assured by Lemma 7 of [1]. 
Choose a corresponding sequence ¢ , ¢:,--- of positive constants such that 


> c; = 1, and define 
(6.1) No(A) = DOs cs-ps(A). 


THEOREM 6.1. As defined, Xo is a probability measure on S such that 

(i) P< do on S; 

(ii) Each S-P-null set is S-do-null; 

(iii) A necessary and sufficient condition that a subfield So be sufficient for the 
measures P on S is that corresponding to each p in P, there exist a nonnegative Sy- 
measurable function 4, such that dp = g,(x) ddo on S. 

This theorem, which is basic to the results of this section, differs from Theorem 
1 of [1] in that A> is a fixed measure satisfying (i) and (ii) such that (iii) holds 
for any subfield S, . As a matter of fact, the proof in [1] of Theorem 1 of [1] also 
proves Theorem 6.1, so a separate proof need not be given here. The measure 
do does not, of course, necessarily belong to the given set P. 

Another useful result which is implicit in [1] is the following version of the 
Fisher-Neyman factorization theorem for sufficient statistics. 





138 R. R. BAHADUR 


Coro.uary 6.1. Let there be given a o-finite measure \ on S such that P & x. 
A necessary and sufficient condition that a siatistic T be sufficient for P i: that there 
exist a nonnegative function h on X and a set \g,:peP} of nonnegative functions 
on the range of T such that 

(a) h(x) is an S-measurable function; 

(b) For each p, gpT'(x) is an S-measurable function; 

(ec) For each p, dp = h(x)-gpT(x) dd on S. 

Proor. It will suffice to prove the corresponding result for a subfield Sp ; 
the corollary as stated will then follow by supposing that S» is induced by T 
and applying Lemma 3.2. Suppose first that S» is sufficient for P. Then, by 
property (iii) of A>, there exist nonnegative So-measurable functions g, such 
that, for each p, dp = g,(x) dko on S. The hypothesis P < \ and property (ii) 
of A» imply that A» < A; hence there exists a nonnegative S-measurable function 
h such that dy = A(x) dd on S. It follows that for each p, dp = h(x)-g,(x) dd 
on S. 

Conversely, suppose that the last stated relations hold, where h is a non- 
negative S-measurable function and each g, is a nonnegative So-measurable 
function. It then follows from (6.1) that dy = h(x)-k(x) dX on S, where k is a 
nonnegative S-measurable function. Hence, for each p, dp = g(x) dd on S, 
where g3(x) is a nonnegative Sy-messurable function; g}(x) = gp(x)/k(x) if 
k(x) > O, and is 0, say, otherwise. It follows from property (iii) of A» that Sp 
is sufficient for P. This completes the proof. An alternative method of proof 
is to deduce the necessity of the condition from Theorem 6.2 of [2] and its 
sufficiency from Corollary 4 of [1]. 

Although Corollary 1 of [1] (when stated in terms of statistics), ‘Theorem 
6.2 of [2], and Corollary 6.1 are equivalent versions of the factorization theorem 
in the sense that any one of them implies the other two, they differ from each 
other in form. Theorem 6.2 of [2] is a “simplification” (and extension to the 
case when the given dominating measure is not necessarily finite) of Corollary 
1 of [1]. Corollary 6.1 is a “simplification” of Theorem 6.2 of [2]. 

An example of contexts in which Coroilary 6.1 is more useful than the other 
versions follows. Let X be the m-dimensional euclidean space of points 
x =(x%,,°** , 2m), let S be the field of Borel sets of X,and let P = {pe:0 <0 < ~} 
where ps is the probability measure corresponding to 2, #,, --* and x, being 
independently and uniformly distributed in the open interval (0, 6). Let 
T(x) = max {x,, +--+ , tm} and let the dominating measure \ be m-dimensional 
Lebesgue measure. It is desired to verify that 7' is a sufficient statistic by ex- 
hibiting a suitable factorization of the probability densities with respect to X. 
Each pp has the representation dpg = hA(x)-geT(x) dd where h(x) = 1 if 
min {a,--+,2%m} > O and is O otherwise, and g(r) = (1/0)”" if 0 < r < @ and 
is 0 otherwise. The desired result follows immediately from Corollary 6.1. 

Corollary 1 of [1] and Theorem 6.2 of [2] do not apply to the simple and almost 
inevitable factorization used in the above example because h is not integrable 
with respect to A. (In general, even if the common factor of the given factoriza- 





SUFFICIENCY 439 


tion happens to be integrable with respect to the dominating measure, the 
verification of this fact may be troublesome.) The only practical method of 
establishing the sufficiency of 7 by means of, say, Theorem 6.2 of [2], seemingly 
is to begin with the above representation of the measures in P and then discard 
\ as the dominating measure and pass to a more suitable one, that is, to a meas- 
ure Xo for which it is easier to see that dpg = h*(x)-g¢ 7'(x) ddo for each 6, where 
h* and g¢T are Borel measurable functions of x, and h* is integrable with re- 
spect to Ay . This method, which is not entirely consistent with the spirit of the 
factorization theorem, is exactly the one we have used to prove the sufficiency 
part of Corollary 6.1. 

By property (i) of X» , corresponding to each p in P there exists a nonnegative 
S-measurable function, g, say, such that on S$ 


(6.2) dp = gp(x) dro. 


Let A,(r) = {x:g,(x) < r}, and let S* be the field generated by the sets A,(r), 
that is, the smallest field of subsets of X which contains all sets A,(r) with 
0 <r< @ and pin P. Since each A,(r) is an S-measurable set, it is clear that 
S* is a subfield of S. 

THEoreM 6.2. S* is necessary and sufficient for P. 

Proor. By definition of S*, each of the functions g, is S*-messurable; conse- 
quently, by (6.2) and by property (iii) of \>), S* is sufficient for P. To show 
that S* is necessary, let So be a subfield which is sufficient for P. By property 
(iii) of Xo , corresponding to each p in P, there exists a nonnegative So-measurable 
function h, such that dp = h,(x) dd» on S. It follows from (6.2) by the essential 
uniqueness of density functions that g,(z) = h,(x) [S, do] for each p in P. 

Let C be the class of all sets A such that x4(x) = xs(x) [S, P| for some B ¢ Sy. 
It is easy to see that C contains X, that A eC implies (XY — A) ¢ C, and that 
A,eC fori = 1, 2, --+ implies U,A;¢C, so that C is a field. Now, it follows 
from the conclusion of the preceding paragraph, using property (i) of A», that 
each of the sets A,(r) is in C, the Sy-measurable set corresponding to A,(r) 
being B,(r) = {xth,(x) < r}. Since S* is the smallest field containing the sets 
A,(r), we conclude that S* € C. It now follows from the definition of C that 
S* & S [S, P|. Since S, is arbitrary, S* is shown to be necessary for P, and 
Theorem 6.2 is proved. 

A method for constructing the necessary and sufficient subfield for a dominated 
set of measures is given in the paragraphs preceding the statements of Theorems 
6.1 and 6.2. We shall illustrate this method by a simple example. Let X be the 
m-dimensional sample space of points « = (a, --+ , %m), let S be the field of 
Borel sets of X, and let P = {pp: —2% < @< &}, where 7 is the probability 
measure corresponding to x,, --- and z,, being independent normal variables 
each with mean @ and variance unity. Let Po consist of the one measure po.o . 


Then Ao = peo, and each pe has the representation dpp = ge{x) d\y , where 
‘ 


ge(x) = exp {—m[6 — 20T(x)]/2} and T(z) = > Tx,/m. A simple computation 
now shows that the sets A,(r) are X, the empty set, and all sets {2:7 (x) < r} 





440 R. R. BAHADUR 


and {z:T(z) > r} with —«o <r < o. The field generated by these sets is 
easily seen to be 7” '(R), that is the class of all sets {z:7(x) e B} with Be R, 
where R is the class of Borel sets of the real line. Hence 7” '(R) is necessary and 
sufficient for P. 

In the preceding example, 7’ is a sufficient statistic for P, so that the result 
obtained is a special case of the following theorem, which is stated here without 
proof. 

THEOREM 6.3. If T is a sufficient statistic for P, there exists a field Ty of subsets 
of the range of T' such that the subfield T™’(T>) is necessary and sufficient for P. 

THEOREM 6.4. Let So and Soo be subfields of S such that Soo & So [S, P]. If Soo 
is sufficient for P, then so is So. 

This result is the converse of Corollary 5.1. The corresponding result for 
statistics is: “If 7’ is sufficient, and 7’ is essentially a function of U, then U is 
sufficient.” 

Proor. Suppose that Soo is sufficient, and consider an arbitrary but fixed 
p in P. It follows from property (iii) of \> that there exists a nonnegative Syo- 
measurable function g, such that dp = g,(x) dd» on S. It follows from the hy- 
pothesis So & So [S, P|] by an obvious argument (cf. Lemma 7.1) that there 
exists a nonnegative So-measurable function, h, say, such that h,(x) = g,(x)[S, P]. 
Hence h,(x) = g,(x) [S, do], by property (ii) of \>. Hence dp = h,(x) dd» on S. 
Since p is arbitrary, it follows from property (iii) of \» that Sp is sufficient, and 
the proof is complete. 

Coro.uary 6.2. Let S* and Sy be subfields of S, and suppose that S* is necessary 
and sufficient for P. Then 

(i) So is necessary for P if and only if Ss & S* [S, P); 

(ii) So is sufficient for P if and only if S* & Sy [S, P); 

(iii) So is necessary and sufficient for P if and only if So = S* [S, P}. 

The only part of Corollary 6.2 which does not follow immediately from 
Definitions 5.1 and 5.2 is the ‘if’ part of (ii), and this part is a consequence of 
Theorem 6.4. It follows from this remark that, except for the ‘if’? part of (ii), 
Corollary 6.2 is independent of the present assumption that P is dominated. 

A consequence of Theorem 6.2, Corollary 6.2 (ii), and Lemma 3.2 is: “Given 
a sample space of possible outcomes and a dominated set of possible distributions 
of the cutcome, there exists an inherent class of events such that a statistic T 
is sufficient for P if and only if each event in this class depends on the outcome 
essentially only through 7'.” 

A statistic 7* is said to be necessary for P if, for each sufficient statistic 7’, 
7* is essentially a function of 7’, that is, there exists a function F on the range 
of T into that of T* such that T*(x) = F(T(x))[S, P]. A statistic which is neces- 
sary and sufficient is then a ‘minimal sufficient statistic’ in the terminology of 
Lehmann and Scheffé [2]. As stated in the introduction, they proved the existence 
of such a statistic in the case when, in addition to being dominated, the set P 
is separable under the metric d(p, q) = supaes| p(A) — qg(A)|. Let Q = 
{q1, 92, ***} be a countable everywhere dense subset of P, and let dq;/ddo 





SUFFICIENCY 441 


¢i(z) fori = 1, 2, ---. Then T*(z) = (g(x), ¢2(xz), ---) is a necessary and 
sufficient statistic for P. This construction is based on Theorem 6.1 and Corollary 
6.1; it differs from, but is necessarily equivalent to, the one given in [2]. 

Three unsolved problems, which appear to be of some theoretical interest, 
are: 1) Whether Theorem 6.4 is valid in the general case. 2) Whether a necessary 
and sufficient subfield exists in the general case. 3) The exact relations between 
the notion of necessary and sufficient statistic and of necessary and sufficient 
subfield. For example, does the existence of a necessary and sufficient subfield 
always imply the existence of a necessary and sufficient statistic, and if so, is 
the subfield induced by a necessary and sufficient statistic always a necessary 
and sufficient subfield? 

It is perhaps relevant to problem 3 that there do exist dominated sets which 
are not separable. Consider an uncountably infinite collection of independent 
binomial trials, each with probability 4 of success. Let the trials be indexed, 
{&: 6 e 2} say, and let X denote the set of all possible outcomes of the col- 
lection of trials. For each 6 let E(@) © X be the event that &» results in success, 
and let S be the field generated by the sets £(@). As is well known, there exists 
a probability measure on S, say A, such that A[Ni_1 E(@,)] = (4)* for an finite 
set 6; , 02, +++ , 0 of indices with 6; ~ 6; . For each 6 define pe(A) = 2A(A n E(@)) 
forA e S,andlet P = {py : 6 ¢ 2}. Clearly, pp < A for each 6, so that P is a domi- 
nated set of probability measures. We observe next that supaes | pe(A) — ps(A) | 
= | pol EL (0)] — psE(0)] | = 4 for 6 ¥ 6. Since Q is uncountable, it follows that 
P is an uncountably infinite set such that the distance between any two distinct 
points is at least 4. Hence P is not separable. This example was communicated 
to the author by L. J. Savage. 


7. Sufficiency and statistical decision functions. Let there be given a measura- 
ble space (D, D), called the decision space, and suppose that the statistician is 
required to construct a measurable procedure, called a decision function, for 
associating each possible outcome z with a point of D. Of several general methods 
of constructing decision functions, we shall adopt the following one. 

Let u(C, x) be a function such that yu is a probability measure on D for each 
x and an S-measurable function of x for each C e D. We then say that yu is an 
S-measurable decision function. In using yu to arrive at a decision, the statistician 
first obtains a particular outcome, say x. He then performs an experiment whose 
outcome 6 takes values in (D, D) according to the known distribution y(C, x). 
He takes 6 to be his decision. 

The decision adopted by the statistician in a given instance is called the 
terminal decision. We assume that when the outcome is distributed in (X, S) 
according to p, the terminal decision in using yp is distributed in (D, D) according 
to \, , where 


(7.1) (Cty) = | u(C, x) dp. 
x 





442 R. R. BAHADUR 


Two decision functions » and » are said to be equivalent if, for each p in P, 
A,p(Ciu) = A,(C:v) for all C e D. 

Let R be the real line and let R be the class of Borel sets of R, as in Section 5. 
Suppose that there exists a one-to-one mapping p of D into R such that 


(7.2) C ¢ D implies p(C) ¢ R, B ¢ R implies p(B) e D. 


We then say that (D, D) is of type (FR, R). The justification for this terminology 
is, of course, that in this case the decision space may, for theoretical purposes, 
be taken to be (R, R). It is assumed henceforth that (D, D) is of type (R, R). 
As stated in the introduction to the paper, this assumption is valid if, in par- 


ticular, D is a Borel set of the m-dimensional euclidean space and LC is the class 
of Borel sets of D(1 S m S ~~). ({10], p. 159, Exer. 7) 

Let p be a one-to-one function on D into R such that (7.2) holds, and let 
p(D) = Ry . Then Ry isa Borel set. Let Ro be the class of Borel sets of Fy , that 
is, all sets BN Ry with B e€ R. Since u(C) — yu*(p(C)) is a one-to-one correspond- 
ence between probability measures on D and R,, it follows from Theorem 
5.1 (ef. the remark following its proof) that a subfield S) is sufficient for P if 
and only if corresponding to each S-measurable decision function there exists 
an Sp -measurable decision function »v such that, for each Ce D and pe P, 
v(C, x) is the conditional expectation function of u(C, x) given Sy and p. An 
application of Lemma 4.1 now yields 

THeoreM 7.1. If So is sufficient for the measures P on S, then corresponding to 
each S-measurable decision function, there exists an equivalent S»-measurable 
decision function. 

Theorem 7.1 is a rather special consequence of the apparently stronger result 
preceding it. The proofs of the sequential analogues of Theorem 7.1 are much 
more dependent on the corresponding apparently stronger results; nevertheless, 
the question whether the present definition of sufficiency is stronger than is 
necessary remains open (cf. [12]). ' 

LemMa 7.1. Let S; and S, be arbitrary subfields of S. Let c and d be extended 
real valued constants with -*« Sec <ds «. Then the following statements are 
mutually equivalent: 

(i). S, & S, [S, P}. 
(ii). Corresponding to each S,-measurable function f such that c S f S d, there 
exists an S.-measurable function g such thate S g S dand g(x) = f(x)[S, P}. 
(iii). Corresponding to each S,-measurable decision function yw there exists an 
S.-measurable decision function v such thet v(C, x) = y(C, x) for all 
Ce DIS, P}. 

The proof of Lemma 7.1 is paralle] to that of Theorem 5.1, and so is omitted. 
From the equivalence of statements (i) and (iii) of Lemma 7.1 it follows that 
if S* is necessary and sufficient for P, then the sufficiency of S* affords the 
maximum possible reduction of the decision problem by means of Theorem 7.1. 
It follows also, by first applying Theorem 7.1 to Soo, that if Soo is sufficient, 
and So G& So[S, P], then Sp is “sufficient’’ at least in the sense that the con- 





SUFFICIENCY 443 


clusion of Theorem 7.1 is valid. (cf. prob. 1 of Sec. 6, and the remark preceding 
Lemma 7.1). 


8. The sequential case. Two theorems in the statistic terminology. Let 
&:, &, °-:* be a sequence of experiments which are to be performed in the 
order 1, 2, --- and let x = (x,, x2, --+) denote the sequence of outcomes in 
case each of the experiments is carried out, x, being the outcome of &, for 
m = 1, 2, ---. Let X be the set of all possible sequences z, let S be a given 
field of subsets of X, and suppose that x is distributed in (X, S) according to p, 
where p is some (unknown) one of a certain set P of probability measures on S. 
It is convenient to suppose for the present that the sequence of experiments is 
infinite, but no particular relation such as functional or stochastic independence 
is assumed to hold between their individual outcomes. Thus x = (2, %, °--) 
is an infinite but otherwise quite arbitrary sequence of chance variables whose 
joint distribution is given by Pr {xeA} = p(A) for Ae S, where p is some 
member of P. 

For each m, let tm(a1, %2, -*-+) = (41, +++ , Lm) and let (Xm), Scm)) be the 
sample space of the values of 7,, . (See Sec. 3.) For each m, let 7',, be a statistic 
on Xm) and let (Y,, 7) be the sample space of the values of 7’, . The typical 
points of X,,, and Y,, are denoted by 2:») and y, respectively. Then 2m) is 
distributed in (Xm) , Som) according to PTs and 7,, is distributed in (Y,, , Tm) 
according to ote te form = 1, 2, --- . It is important to observe that these 
statements refer to nonsequential sampling distributions. 

It might appear at first sight that denoting the typical value of 7',, by yim 
and the set of all values of 7',, by Ym) would be a better notational parallel with 
Xm) and Xm), but this is not the case. A statistician who is supplied with the 


outcome of each &,, possesses « = (x; , 2%, -++), while a statistician who is sup 


plied only with the observed value of each T',, possesses y = (Y, 2, °**) 
In the present notation, the subscript m refers to the mth coordinate and (m) 
refers to the first m coordinates of x or y. Thus (X,,, S,) denotes the sample 
space of the mth coordinate of x, that is, of the outcome of &,, ; this space is not 
required at present but it (or rather, the corresponding subfield of S) appears 
later in conditions (b) and (c) of Theorem 11.5. Again, (Y¢m) , Tim) denotes 
the sample space of the first m coordinates of y. This last sample space is, how- 
ever, of little interest to us, one reason being that in many important cases 
fe.g., vm real and T'm(x;,-** Xm) = 4% +--+: + 2,,/mfor m= 1, 2, ---] Yom) and 
X¢m) are in one-one correspondence, so that simultaneous possession of the 
observed values of T,, T,, --- , and 7, means possession of the outcomes of 
& , &, +, and &,,. 

In terms of the sample spaces and distributions introduced above, our basic 
definitions are the following. {7',,} is a sufficient sequence (for P) if, for each m, 
7’, is a sufficient statistic for the possible distributions of z;») . {7',| is a transitive 
sequence (under each p in P) if the following condition is satisfied for each m 
and each p in P: For any event B depending only on y»4;, the conditional 





444 R. R. BAHADUR 


probability of B given x:m) depends on 2,m) only through 7’, . In other words 
(cf. Lemmas 4.6 and 4.8), {7',,} is transitive if for each m the conditional dis- 
tribution of ym4i given 2m) is the same as its conditional distribution given 
Ym = T'm(Zm)). Intuitively speaking, transitivity means that, under each possible 
distribution of the sequence x = (x; , 2, ++ ), Ym is exactly as good a predictor 
Of Ym41 AS 1S Lom) = (41, --* , tm), form = 1, 2, --- . Less informal definitions, 
and several characterizations, of “sufficient sequence” and “transitive sequence”’ 
will be given later in terms of subfields. In this section and the following one, we 
are concerned mainly with describing the reasons why these notions are of 
importance, and the statistic terminology is the more appropriate one for our 
immediate purposes. 

For each m, let @m(2im)) be an S,m)-measurable function on X,) such that 
0 S a, S 1. We then say that the sequence {a,,} is a sampling rule. In using 
{a}, the statistician performs & , & , --- in succession. When the first m experi- 
ments have been carried out, he performs an experiment the outcome of which, 
Um say, takes only the values 0 and 1 with Pr {u,, = 1} equal to the observed 
value of a, . If u», = 1, the experimentation is terminated, but if u,, = 0, then 
&m41 is carried out; m = 1, 2, --- . The total number of experiments which are 
carried out in a given instance is called the sample size and is denoted by n. 
When z is distributed according to p, the probability distribution of n in using 
{am} is given by 


=! 
Pr {n = m} = [ Om (Xm) Aptm 5 
X (m) 


where 


(a,(21) meal 


(8.1) Om (21, ‘os 2a) = \( 
{ 


m— 1 
IL 0 — ax, +201) * Om(ti1,°**, 2m) m > 1. 


t=1 


The sampling rule is said to be closed if the probability of terminating the 
experimentation at some stage is always unity, that is, if }>,, Pr{n = m} = 1 
for each p in P. A sampling rule does not, in general, require that the experiments 
be performed one at a time. The possibility of grouping remains open, the group 
size after the first m experiments have been carried out being a nonrandomized 
function of their outcomes (m = 1, 2, ---). 

A sampling rule {a,,} is said to be based on {7’,,} if for each m there exists a 
function, aS, say, on Y, such that am(2im)) = Gm{T'm(tcm))]. In using such a 
sampling rule, after the first m experiments have been carried out, the decision 
whether or not experimentation is to be continued depends only on the ob- 
served value of 7’, , (m = 1, 2, ---). 

Let the typical outcome 2, , --: , 2, of using a closed sampling rule be denoted 
by z. It can be shown that if {7',,} is a sufficient sequence, and z is obtained 
according to a specified rule, then the sample size n = n(z) and T, = T'nq)(z) 





SUFFICIENCY 445 


together constitute a sufficient statistic for the possible distributions of z. This 
important result has the usual consequences in statistical decision problems 
(ef. Sec. 1) In particular if the sample space of z is of type (R, R), and {7} 
is a sufficient sequence, the statistician who is supplied only with the observed 
sample size n and the observed value y, of 7’, in using a particular rule {a,,} 
could, if he wished, construct a hypothetical outcome z* = (z],---, 2%) such 
that, for each p and P, the probability distribution of z* is identical with that 
of the total outcome of using {am}. 

Now suppose that {7',,} is a sufficient sequence. Let there be given a closed 
sampling rule {a,,}, and consider the problem of constructing one based on {7'}, 
say {a}, which is equivalent to {a,} in some adequate sense. It is intuitively 
clear (and easily proved) that in general there exists no {a,} such that, for 
each p in P, the probability distributions of z under the two rules are identical. 
This last requirement is, however, unnecessarily strong; since {7',,} is a sufficient 
sequence, the results stated above show that in using any given rule, the sta- 
tistician could, without disadvantage, regard n and y, rather than z itself as 
the outcome of the sequential experimentation. 

The problem thus reduces to the construction, if possible, of a sampling 
rule {a,} based on {7',} such that, for each p in P, the joint distribution of n 
and y, under {a} is identical with their joint distribution under the given 
rule {a,,}. The assumed sufficiency of the sequence {7} turns out to be a 
necessary but insufficient condition for the existence (in general) of such an 
{as}, and the additional condition required is precisely that {7',.} be transitive. 
Methods for constructing the {as} equivalent to a given {a,,} are stated in the 
paragraphs following Theorem 8.2 below. 

By combining a part of the above result with the one stated at the end of 
the third paragraph back, we obtain the following result. If {7',,} is a sufficient 
and transitive sequence, and the sample space of z is of type (R, R), then corre- 
sponding to any closed sampling rule {a,,} there exists a closed sampling rule 
{a} based on {7',,} such that: (i) the two rules are equally expensive, that is, 
for each p in P, the probability distribution of n is the same for the two; and 
(ii) the two rules yield the same amount of information concerning the (unknown) 
actual distribution of x, that is, a statistician who is supplied with the outcome 
of using {a} could calculate a sequence z* = (a, «++, 2%) such that, for 
each p in P, the probability distribution of z* is identical with that of the out- 
come of using {a,,}, and conversely. If the sequence of experiments is regular 
in a sense to be defined later, the requirement that {7',,} be transitive can be 
omitted from the hypotheses of the last-stated result. 

We proceed to a formal statement of the main results described above. Let 
Z be the set of all finite sequences z = (x, «++ , 2m) where 2; is a possible out- 
come of &; ,fort = 1,--- ,mwithm = 1,2,--- . Foreachzin Z, write n(z) = m 
if and only if z has m coordinates x; , with m = 1,2, --- . If K is a subset of Z 
and Aq) , Aw, «** is a sequence of subsets of Xq), X@, «++ respectively, write 
K ~ [Aq , Aq@, ++] if and only if xx(z) = fag) (2) for all z e Z, where 





146 R. R. BAHADUR 


Sm(2(m)) 18 the characteristic function of Aym for m = 1, 2, --- . The relation ~ 
establishes a one-to-one correspondence between subsets of Z and sequences of 
subsets of Xq) , Xiw , ; 

Let Z be the class of all sets K © Z such that K ~ [Aq , Aw, ---] where 
Aim) i8 an S,»)-measurable set for m = 1, 2, --- . That Z is a field is readily 
seen. We take (Z, Z) to be the sample space of the outcome of using a closed 
sampling rule on the given sequence of experiments. If K ~ [Aq), Aw, --°], 
the event “‘z ¢ K”’ is, of course, the union over all m of the (mutually exclusive) 
events E,, = “n = m and 2m) € Acm’’. Note that E,, is impossible and can 
therefore be omitted from the union in case A; m) is the empty set. In particular, 
if Acm) is the empty set for m ~ rand Ay, = X,,) , the event “z e K”’ is simply 
the event ‘‘n = r.” It is easy to verify that (Z, Z) is of type (R, R) if and only 
if each of the spaces (X;m) , Som) is of that type. 

Let {a,,} be a closed sampling rule, and let {a,,} be the corresponding se- 
quence of functions defined by (8.1). When z is distributed in (X, S) according 
to p and {a,,} is used, z is distributed in (Z, Z) according to g, where for any 


Ewe. da,* ick. 


q(K) = > | Om (Lm)) dprn ; 
m=1 J A(m) 
Let Q be the set of all g corresponding to p in P; since Q depends on {a,,}, we 
write Q = Qia,}. 

Define V(z) = [n(z), T'nc)(z)|. Then V is a statistic on Z. The typical value 
of V is denoted by (n, y,). 

THeoreM 8.1. | 7',} 18 a sufficient sequence if and only if, for each closed sampling 
rule {adm}, V is a sufficient statistic for the measures Q{am} on Z. 

An outline of the proof follows. Suppose first that | 7’,,} is a sufficient sequence. 
Let there be given a closed sampling rule {a,,}, and let {a,,} be the corresponding 
sequence of functions defined by (8.1). Choose and fix an arbitrary K e Z, say 
K ~ |Aq), Aw, +++] and let f, be the characteristic function of A;m) for 
m = 1, 2, ---. For each m, let g», and yy» be nonnegative 7,,-measurable 
functions of y,, such that, for each p in P, ¢gm(ym) and ~m(ym) are the condi- 
tional expectations Of am(am)*fm(@im) and Of am(2 im), respectively, given 
T m(Zim)) = Ym Define hy(ym) = om(Ym)/m(Ym) if Yn(Ym) > O and = 1 (say) 
otherwise for m = 1, 2, +--+. Set g(n, yn) = ha(yn). Then, for each gq in Q{an}, 
g(n, Yn) 18 the conditional expectation of xx(z) given V(z) = (n, yn). Since K 
is arbitrary, it follows that V is sufficient for Q{a,}. 

Suppose now that V satisfies the last-stated condition. Choose and fix a 
positive integer k, and define a,, = 0 form < k and = 1 form 2 k. In using 
fam}, (Z, Z) reduces to (Xu), Su); V to Ty; and Q{a,} to {pri'ipe P}. 
It follows, therefore, that 7) is sufficient for {pri ip e P}. Since k is arbitrary, 
{7'm} is a sufficient sequence. This completes the outline proof. 

The non-trivial part of Theorem 8.1 appears to have been stated and used 
first by Girschick, Mosteller, and Savage ({16], p. 15) in the context of esti- 





SUFFICIENCY 147 


mation from binomial samples. A proof is contained in [9] for the case when 
the given rule is based on {7’,,}. Since the result is valid without restriction, 
Blackwell’s construction of unbiased sequential estimates [9] can be extended to 
any closed sampling rule. 

THEOREM 8.2. {7',,} 1s a sufficient and transitive sequence if and only if corre 
sponding to each closed sampling rule, there exists a closed sampling rule based on 
1T'm} such that, for each p in P, the probability distribution of (n, y,) = V(z) 
is the same under the two rules. 

This theorem is a consequence of Theorem 11.4, and its proof will be indicated 
in Section 11. If {7',,} is a sufficient and transitive sequence, and there is given a 
closed rule {a,,}, the proof shows that the corresponding equivalent rule based on 
{7} is determined as follows. For each m, let E% be the event “n = m in using 

vy 


fam},’? and let £,, be the event ‘“n = m in using jfa,,}.”’ For each m, regard 


mje 
E* and E,, as events defined in terms of the nonsequential outcome of the first 
m experiments and let f> and f, be functions of y,, with O < f2 S sf», such 
that, for each p in P, f2(Ym) and fn(ym) are the conditional probabilities of E> 
and E,, , respectively, given 7'»(%im) = Ym. The sampling rule in question is 
An(Ym) = fn(Ym) /fm(Ym) 1f fm(Ym) > O and = 1 otherwise, for m = 1, 2, 
An alternative (but necessarily equivalent) construction for {a>} is the following: 
An(Ym) = fal(Ym)/Gm(Ym) if Gm(Ym) > O and = 1 otherwise, for m = 1, 2, 
where gi = 1 and g, = the conditional expectation given y, of 
(1 — a;)(1 — as) --> (1 — Gumus) for me > 1. Assuming that {a),} as defined is 


a sampling rule, that is to say, 0 S a, < 1 for each m, it is easily seen that 


oO . . rr . Oo . . . 
{dm} is equivalent to ja,,|. The fact that {a,,} is indeed a sampling rule (so that 


Gn(Ym) = the conditional probability given y,, of the event “n 2 m in using 
{am}’’), and that this rule is the same as the one defined in the preceding para- 
graph, are consequences of transitivity. 

The following result is an immediate consequence of Theorems 8.1 and 8.2. 

Corouuary 8.1. {7',} is a sufficient and transitive sequence if and only if the 
following conditions are satisfied. 

(i) For each closed sampling rule {a,»,}, V is a sufficient statistic for the measures 
Q{am} on Z. 

(ii) Corresponding to each closed sampling rule, there exists a closed sampling 
rule based on {T',,} such that, for each p in P, the probability distribution of (n, yn) = 
V (z) is the same under the two rules. 

The preceding discussion is entirely in terms of an arbitrary sequence 7’; , 
T2, +++ of statistics on Xq) , X~ , --+ respectively. The problem of determining 
explicitly all sequences {7',,} which are sufficient and transitive is at present 
unsolved even in the simplest cases. There is a related but more important 
unsolved problem. Suppose that {7%} is a necessary and sufficient sequence, 
that is, 7% is a necessary and sufficient statistic for the measures {ptm ip eP}, 
for m = 1, 2,---. The problem is to characterize frameworks (X, S), P in 
which {7%} is transitive. It can be shown that a sufficient condition that {7%} 
be transitive is that 2, 22, --- be a sequence of independent chance variables 





448 R. R. BAHADUR 


for each p in P. This result is one of the few results concerning statistics which 
are stated in this paper without proof, but which are not corollaries of the cor- 
responding results for subfields. There is no difficulty, however, in constructing 
a proof parallel to that of Theorem 11.5, which is the corresponding result for 
subfields, given in the final section of the paper. 

Now consider very briefly the case when the given sequence of experiments is 
finite, say & , &,--: and & withk > 1. Let 7,, T:, --- and T;, be statistics 
on Xq , Xq@, -:: and Xq@), respectively. Sufficiency and transitivity can be 
defined in this case in the obvious way, that is, {7',} is a sufficient sequence if 
T » is a sufficient statistic for the possible distributions of x,,) , form = 1, --- k. 
If the condition which appears in the previous definition is satisfied for each 
m= 1,---,k — 1, {7m} is a transitive sequence. Then Theorem 8.1 and 
Corollary 8.1 remain valid. 

It turns out, however, that Theorem 8.2 as stated is not quite true in this 
case; the condition of the theorem is satisfied if and only if {7} is transitive 
and 7’,, is a sufficient statistic form = 1, --- ,k — 1. This not very interesting 
difference between the finite and infinite cases is about the only one, so in the 
sequel we shall for simplicity confine ourself to the infinite case. The finite case 
can, for mathematical purposes, be regarded as a “special case’’ of the infinite 
one. A finite sequence of experiments and statistics can always be extended into 
the infinite case in such a way that the sufficiency and/or transitivity of the 
sequence of statistics is not destroyed by the extension. One such extension is: 
given &,-°:-,& and 7,,--- , 7, , foreach m > k, let &,, be the trivial experi- 
ment for which z, = 0, and let 7'n(a,, +--+, 2m) = Tr(a, +++ , 2). 

In concluding this section, we recall that we have been discussing a sample 
space (X, S), a set P of probability measures on S, a fixed naturally determined 
sequence {r,,} of statistics on X, with 7,, a function of 7,4; , and an arbitrary 
sequence {U/,,} of statistics on X, with U, = Tr» a function of 7,,. In the 
final sections of the paper, we shall, in effect, replace each of these statistics by 
the subfield of S induced by it. Therefore we shall discuss (X, S), P, a fixed 
sequence {S‘”} of subfields of S with S°” ¢ S‘*”, and an arbitrary sequence 
{S$”} of subfields of S with S{” ¢ S‘”’. All definitions and results concerning 
{Ss”} can then be translated into corresponding definitions and results con- 
cerning the sequence {7',,} of the present section by applying Lemmas 3.1 and 
3.2 to the following identifications, with m = 1, 2, 

S‘” = +7'(Scm)) = the subfield of S induced by 7, ; 


(m) 


; = tm (Tm (T'm)) = the subfield of S induced by Un = T'mtm. 


9. Some examples. In each of the examples which follow, &, &, --- is a 
sequence of binomial trials. The trials are independent and identical only in 
2xamples 9.1 and 9.2. In each example, x, = 1 if the outcome of &,, is ‘“‘success”’ 
and z, = Oif the outcome is “‘failure;’’ Xm) is the set of all points Ce + 
with z; = 0 or 1 fori = 1, --- , m; and S,) is the class of all subsets of Xm), 
with m = 1, 2, --- 





SUFFICIENCY 449 


The first three examples show that a given sequence {7',,} of statistics may 
be sufficient and transitive (Exam. 9.1), or transitive but not sufficient (Exam. 
9.2), or sufficient but not transitive (Exam. 9.3). 

EXAMPLE 9.1. Suppose that 2; , 22, --- is a sequence of independent random 
variables, with Pr {z,, = 1} = 6 for each m, where @ is an entirely unknown 
fraction. Let T,(a,, +++ , 2m) = 2% + +++ + 2» be the number of successes in 
the first m trials. As is weil known, {7',,} is a sufficient sequence. To show that 
{7',} is transitive, consider arbitrary but fixed m and 6. Then, for each k = 0, 

-,m + 1, the conditional probability of the event 7.4; = k given 2m) is @ 
if Tm(2um)) = k — 1,18 1 — 6 if T'n(aemy) = k, and is zero otherwise. It now 
follows from the additivity of conditional probability that for any event B 
depending only on T’,,4;, the conditional probability of B given 2,») depends 
on the condition only through 7',, . Since m and @ are arbitrary, it follows that 
{T',} is transitive. 

EXAMPLE 9.2. In the preceding example, let 7,, = m/2 (say) for each m. 
Then {7} is transitive but not sufficient. The verification is omitted. 

EXAMPLE 9.3. Let 2 , 22, «++ be a sequence of independent random variables, 
with Pr {z; = 1} = 4 and Pr {z, = 1} = @form > 1. Let 7;(a,) = 4 (say), 
and let T'n(t, °-* , &m) = (a1, 2.2 2;) for m > 1. Then {7’,,} is a sufficient 
sequence. To show that {7',,} is not transitive, let B be the event that 2, = 1; 
this is certainly an event depending only on the value of 7, = (2, x2). The 
conditional probability of B given x; is 0 or 1, accordingly as 2; = 0 or 1, and is 
clearly not a function of 7, , that is, not a constant. Hence {7’,,} is not transitive. 

The transitivity of {7',,} in Example 9.1 is an illustration of the general result 
stated after Corollary 8.1, since {7',,} is a necessary and sufficient sequence in 
that example. We shall now give examples to show that if 2, , x2, --- is not an 
independent sequence for each p in P, the necessary and sufficient sequence 
may or may not be transitive (Examples 9.4 and 9.5 respectively). 

EXAMPLE 9.4. Suppose that Pr {z; = 1} = @ and Pr jz, = 2,;} = 1 for all 
m, where @ is an entirely unknown fraction. Let 7',,(a ,--- ,%m) = 2 for each 
m. The sequence {7',,} is necessary, sufficient, and transitive. The verification is 
omitted. 

EXAMPLE 9.5. Suppose that Pr {z, = 1} = 4, and, given x, , that 2, 23, --- 
is a sequence of independent random variables with Pr {z, = 1} = @ or 6, 
accordingly as z; = 0 or 1, for each m = 2, 3,--- , where @ and 6 are entirely 
unknown fractions. Let 7:(7,;) = 4 and 7,(%,---, 2m) = (m1, 2 x;) for 
m > 1. Then {7’,,} is necessary and sufficient, but not transitive. The verifica- 
tion is omitted. (Cf. Exam. 9.3.) 

The last example shows that if a sequence {7’,,} is sufficient but not transitive, 
then it does not necessarily reduce a statistical decision problem. (Cf. also the 
reference to this example in Sec. 1.) 

EXAMPLE 9.6. Suppose that the experimental framework is that of Example 
9.5, and it is required to estimate 6. Suppose that if in a given instance the 
sample size is n and 9 is estimated by the value ¢, the statistician incurs a loss 
L which is (t — 6)’ if n < 2, 2(t — 6)’ if n = 2, and infinite if n > 2. For 





150 R. R. BAHADUR 


any estimation procedure yu, let r,(@, 6) denote the expected value of L in using 
uw. Regarded as a function of the unknown parameters 6 and 6, 7,(6, 6) is called 
the risk function of mw [13]. 

Now let uw be the following procedure: ‘Observe 2, . lf z; = 1, terminate the 
experimentation and estimate @ to be 4; if 2; = 0, obser\e z, also, terminate 
the experimentation, and estimate @ to be 4 or 74, accordingly as x. = 0 or 
1.” A simple computation shows that for all @ and 6 


(9.1) r(0, 5) = %q. 


Let }7',} be the sequence of statistics considered in Example 9.5. Then 
{7'n} is a sufficient sequence. We shall show, however, that each of the follow- 
ing statements is false: 

(i) There exists a procedure v based on {7',,} which is equivalent to yu, that is, 
for each 6 and 4, the joint distribution of n and ¢t is the same under the two pro- 
cedures (cf. Sec. 1). 

(ii) There exists a v based on {| 7',,} such that r,(@, 6) = r,(@, 6) for all @ and 6. 

(iii) There exists a v based on {7',,} which is minimax in the class of all estima- 
tion procedures. 

Since (i) evidently implies (ii), it will be sufficient to show that (ii) and (iii) 
are false. Suppose to the contrary that one or both of the statements (ii) and 
(iii) are true. It then follows immediately from (9.1) that there exists a v based 
on {7,,,} such that for all 6 and 6 


(9,2 r,(@, 6) Ye 4. 


(As a matter of fact, the minimax risk in this example is %%4, so that uw is a mini- 
max procedure, but this property of uw is irrelevant to the present argument.) 
Examination of the loss function L and of the sequence {7',,} now shows that 
this vy must have the following structure: ‘“Take no observations with proba- 
bility a, and take two observations with probability 1 — a. If no observations 
are taken, estimate @ to be é ; if 2, and 2» are observed, estimate @ to be f, .”’ 
Here a is a fixed constant, 0 S a S 1, and f and & are (possibly randomized) 
functions of the observations available at the terminal stages n O and n = 2, 
respectively. To a user of v, the stages n = 0 and n = | are, of course, the same. 


Then, using an obvious notation, 
(9.3) r,(0,5) = ak[(ts — 6) | 0, 6] + 201 a)E{ (tg — 6)” | 0, 6}. 


Define ¢? lift; > 1: = Oif ¢t; < 0; and = ¢,if 0 s ¢; S$ 1, fori = 0, 2. 


Then, for any @ with 0 < @ < 1,with probability one (¢? 6)” s (t; — 6) for 
1 = 0, 2. Hence 


(9.4) ak{(te — 0)" | 6, 6] + 211 — a)El(tl — 0)’ | 6, 6] S %q 


for all @ and 6, by (9.2) and (9.3). Let uw be the expected value of fo, and uw. 
ue(a, , 2) be the conditional expected value of t? given (a1, 22). It is clear that 
these expected values exist finitely, and that they do not depend on 0 and 4; 
By Lemma 3.1 of [3] we have 





SUFFICIENCY 451 


(9.5) (ue — 6) S El(ts — 6)° | @, 4], E|(uz — 0)° | 6,6) S El(ty — 6)* | @, 4} 


for all 6 and 6. Writing 
u(O, 0) = a, u.(0, 1) = uw(1,0) = c u(l, 1) = d, 
it follows from (9.4) and (9.5) that 


(9.6) alu — 6)" 


+ (1 — a) [(a — 0)°(1 — 0) + (b — 0)°6 + (c — 0)°(1 — 8) + (d — 8)%S] 


for all 6 and 6. Letting (@, 5) tend successively to (0, 0), (1, 0), (0, 1), and (1, 1) 
in (9.6) shows that 


au + (1 — a) [a + cc] S %s 
a(1 — uw)’ + (1 — a) [(1 — 6)? + (1 — cc)’: 


(9.7) “ 
aug + (1 — a) [a +d’) S %s 


a(l — uo)” 4+. (l — a) [1 — b)” of (l — d)’| 
\dding the inequalities (9.7) and omitting terms in a and b, we have 
(9.8) 2afus + (1 — w)] + 1 — a) [? + (1 — ce? +a 4+ (1 — dd)’ 


Now, } < 2 + (1 — 2)’ for all real z. Hence (9.8) implies a + (1 — a) 
%,,. This contradiction establishes the desired conclusion, namely that each of 
the statements (i), (ii), and (iii) is false. 

It should be observed that in Example 9.6 there does exist a v based on {7'’,,} 
such that, for each @ and 6, the marginal distributions of n and ¢ in using v are 


identical with the marginal distributions in using uw. This v is defined by ‘a; 
+ and as 1 (i1.e., a = 4), with 4, 4 while ¢.(x; , x2) = 14 if a, = O and zy 0 


but = % if a 0 and x, 1, and = 4 otherwise.”’ It would be interesting 
to know the conditions (if any) under which this situation occurs in the general 


case. 


10. Definitions in terms of subfields. Sequential decision functions. Let there 
be given a set Y of points x, a field S of subsets of X, a set P of probability 
measures p on S, and an infinite sequence S"’, S”, --- of subfields of S such 
that 


(10.1) s"* or m= ‘2 


Throughout this section and the following one, ¥, S, P, and {S°”’} will remain 
Zz ’ j 


fixed. They will sometimes be referred to as “the framework.’’ The framework 
is to be thought of as follows: (X, S) is the sample space of points 
r (x), 22, °° ) with x distributed according to some p in P, and 2. 

is the sequence of subfields of S induced by 7, 72, --- respectively, where 


) 


m/- 


Tatten Zen °°’ | = (x, , cht 





452 R. R. BAHADUR 


( . . . 
An {S”’}-measurable sampling rule is a sequence {a,,} of functions of z 
° (m ‘ ‘ 
such that a,, is S*”’-measurable and 0 S a, < 1 for m = 1, 2, --- . Given an 
( . 
{S°’”’ }-measurable rule {a,,}, for each m we write 


a; (x) - a,(z), 8, (x) l, 
(10.2) 


m—1 


m—1 
am(z) = [I ({1 — a,(z)])-a,(z), Bm (2) I] - a(z)j, m> 1. 
i=] 


i=l 


Then for each m 


(10.3) Om(£) = An(X)-Bm(x), Bm(2) — Om(r2) = Bmii (2). 


It also follows from (10.1) and (10.2) that a,, and 8,, are S‘”’-measurable func- 
tions form = 1,2, --- . Therule {a,,} is said to be closed wt am(x) = 1[S, P]. 
Since in any case): Qm(x) S&S 1 for each 2, it follows that {a,,} is closed if and 


only if >o? | Gm(x) dp = 1 for each p in P. 
x 


In the remainder of this section, we suppose to be given an infinite sequence 
(D, , Di), (D2, De), «-+ of measurable spaces. The mth, (D,,, D,.,), is called the 
mth terminal decision space. Each of the terminal decision spaces is assumed 
to be of type (R, R). 

An {S‘”’}-measurable terminal decision rule is a sequence {b,,} such that 
bm = bm(Cm, £) is a decision function in the sense of Section 7 on (X, S‘”) 
into (D» , Dm), that is, b, is an S‘”’-measurable function of x for each C,, ¢ Dy, 
and a probability measure on D,, for each x, for m = 1, 2,---. An {S‘}- 
measurable decision function is a double sequence [{am}, {bm}] where {a,} is an 
{S‘”}-measurable sampling rule and {b,,} is an {S‘”}-measurable terminal 
decision rule. For any {S‘”}-measurable decision function » = [{an}, {bm} ] 
we write 


(10.4) r-(m:Ce |p) = | on (2)*ba(Ca, 2) dp. 
x 


Two decision functions yu and » are said to be equivalent (cf. Sec. 1) if for each 
m, each C,, ¢ D,, and each p ¢ P, A,(m:Cn | wu) = Ap(m:C» | v). 
Let Sj”, Si”, --+ be an arbitrary sequence of subfields of S such that 


(10.5) ag” oe", 


(m) ° ° ois os 
An {So”’}-measurable sampling rule, or terminal decision rule, or decision func- 
tion is defined exactly as above with {S‘”’} replaced by {S>”’}. The relations 
° ) . e es 
(10.5) imply that an {S;”}-measurable sampling rule (terminal decision rule) 
*-* . . ( ) . / . *-* 
[decision function] is also an {S°”’}-measurable sampling rule (terminal decision 
es : ° ( 1 
rule) [decision function]. It is not assumed that Si” © S{"*” for each m. In 
og (0) 6 alm) ) . 5.9) {Q° 
consequence, if {am} is an {So" }-measurable sampling rule and {a,} and {8,,} 
° + 0 0 
are the corresponding sequences defined by (10.2), then a, and 8, are S‘”- 
: (m) . 
measurable but not necessarily So” -measurable functions. 
(m) ’ oi" ‘ ( ‘ — ° 
DerrniTion 10.1. {S>”} is a sufficient sequence if S;” is sufficient for the 
measures P on S‘”,m = 1,2,°-- 





SUFFICIENCY 453 

DeEFINITION 10.2. {S;”} is a necessary sequence if S$,” is necessary for the 
measures P on S‘”’,m = 1,2,--- 

Tueorem 10.1. Jf {S\”} is a sufficient sequence, then corresponding to each 
{S‘”’ }-measurable decision function » = [{am}, {bm}] there exists an {S,"}- 
measurable terminal decision rule {bg} such that v = [{am}, {bm}] is equivalent to p. 

Proor. Given up = [{a,}, {b.}], it follows from Theorem 5.1 that for each m 
there exists ¢,,(C , x) such that ¢,, is a finite measure on D,, for each x and 
an S>”’-measurable function for each C,, , and such that 


(10.6) ¢m(Cm yt) = Ey(am(x)-bm(Cm , 2) | So”) [S, B] 
for each C,,, ¢ D,, and p ¢ P. For each m, C,, and z define 


as 0 (om(C'm ’ L)/¢m( Dm ’ r) if ¢m(Dm ’ 2) > 0, 
(10.7) Oslvea, %) #4 


| m(C'm) otherwise, 


where z,, is an arbitrary probability measure on D,,. Then {b,} is an {S>”}- 
measurable terminal decision rule. We shall show that v = [{a,}, {bm}] is equiva- 
lent to the given uz. 

Choose and fix arbitrary m, C,, ¢ D, and p e P. We have 


A,(m:C,, | v) [ Gm(2) *Dm(Cm , 2) dp by (10.4), 
x 

= [ E(cm(x) | Si”)-ba(Cn , 2) dp by Lemmas 4.6 and 4.1, 
x 


[ ¢m(Dm , £)*bm(Cm , 2) dp by (10.6) with C,, = D,,, 
x 


[ ¢m(Cn , x) dp by (10.7), 
x 


[ E,(am(2) ba (Cm , 2) | Sk) dp by (10.6), 
x 


[ Om(x)*Dm(Cm , x) dp by Lemma 4.1, 
x 


= A,(m: Cm | u) by (10.4). 
This completes the proof. 
DeriniTIon 10.3. {S3”} is a transitive sequence if for each m, each B ¢ 


(m+1 
o” ’ and each p e P 


(10.8) E,(xe(z) | S“”) = E,(xa(zx) | S88”) [S, p]. 


Tuerorem 10.2. If {S>”} is a sufficient and transitive sequence, then correspond- 
ing to each { Si” }-measurable decision function, therz exists an equivalent { Sj” }- 
measurable decision function. 

Proor. Let there be given an {S‘”}-measurable decision function » = 
{am}, {bm}]. From the sufficiency of {S}”} it follows by Theorem 10.1 that 





454 R. R. BAHADUR 


there exists an {S>”"}-measurable terminal decision rule {b2} such that » = 
[{am}, {bm}] is equivalent to uw. It follows from the sufficiency and transitivity 
of {S$”’}, by Theorem 11.4 of the following section, that there exists an { $5” }- 
measurable sampling rule fan} such that for each m and p « P, 


(m) 


E,(a@m(x) | So”) = E,(an(x) | So”) [S, pl]. 


Since {b,} is {So”’}-measurable, it follows easily from (10.4) and the last stated 
relations by means of Lemmas 4.1 and 4.6 that vy = [{am}, {bn} ] is equivalent 
to v. Hence vp is equivalent to yu. Since yu is arbitrary, the theorem follows. 

It will be shown next that if the framework satisfies a certain structural con- 
dition, then the requirement that {S;”’} be transitive can be omitted from the 
hypothesis of Theorem 10.2. 

Derrnirion 10.4. The framework is regular if there exists a sequence S}’, 
S\?, ---, say, of subfields of S”, S®, --- respectively, such that {S¥"} is 
necessary, sufficient, and transitive. 

A necessary and sufficient sequence, if it exists, is essentially unique. Conse 
quently, the framework is regular if and only if there exist sequences which are 
necessary and sufficient, and each such sequence is transitive. 

THroreM 10.3. Suppose that the framework is regular. If {So"’} is a sufficient 
sequence, then corresponding to each {S’"’ |-measurable decision function, there 
exists an equivalent | So"’ |-measurable decision function. 

Proor. Let u be an {S‘”’}-measurable decision function, and let {S¥"} be a 


necessary, sufficient, and transitive sequence. It follows from Theorem 10.2 that 


there exists an { S,"}-measurable decision function, say vs, which is equivalent 
to w. Since {Sy} is necessary and {S,”’} is sufficient, we have Sy” © S,” 
|S, P| for each m, and it follows from Lemma 7.1 that there exists an {S)”’ }- 


measurable decision function v which is equivalent to vy. Clearly, v is equivalent 


to w. Since yp is arbitrary, the theorem follows. 

Remarks. (i) Apart from Theorem 10.3, the notion of regularity is of interest 
because if the framework is regular, there exists a sufficient and transitive 
sequence which is minimal not only in the class of sufficient and transitive 
sequences but also in the class of sufficient sequences; consequently, it affords 
the best possible reductions of the given decision problem by means of each of 
the theorems of this section. This sequence is, of course, any necessary and 
sufficient sequence. 

(ii) If {So} is sufficient but not transitive, and the framework is not regular, 
then the conclusion of Theorems 10.2 and 10.3 is not necessarily valid. This is 
shown by Example 9.6. Neither of these two theorems contains the other; there 
are cases where Theorem 10.2 applies but not Theorem 10.3, and conversely 
(cf. See. 9). 


11. Characterizations of sufficiency and transitivity. Regularity. 


THeoreM 11.1. {So”'} is transitive if and only if for each m, A ¢« S‘”, and 
peP 


(11.1) By(xa(xz)| S8"*?) = ECE, (xa(x) | Sh) | SE) [S, pl. 





SUFFICIENCY £95 


The corresponding result for a sequence of statistics (see Sec. 8) is: | 7',| 
is a transitive sequence if and only if for each m, each p, and each event A de- 
pending only on 2, 22, -°-- and z,,, the conditional probability of A given 
Ymsi equals the conditional expectation given m4: of the conditional proba- 
bility of A given y», .”’ 

Proor. Consider a fixed m and a fixed p in P. Let A be an S°*”’-measurable 
set and B an S)"*”-measurable set. Then, by using Lemmas 4.1 and 4.6 we have 


Ey(xe(x) | S°"’) dp = | Xa(2)+xe(x) dp 
x 


7A 


(11.2) 
= / Ey (xa(x) | grecny dp, 
B 


| E,(xs(x) | Ss") dp | E,(xa(x) | So”) -E,(xe(x) | Si”"’) dp 
A x 

/ E,(xa(x) | So’) dp 

B 


™ / E,(E,(xa(x) | So”) | So”*”) dp. 
B 


Hence 


(11.4) i E,(xs(x) | S“”) dp = / E,(x»(x) | Si”) dp 
A A 


if and only if 


(11.5) i E,(xa(x) es dp = / E,(E,(x4(x) | So”"’) oo) dp. 
B B 


(m 


Since the integrands in (11.4) are S*”’-measurable functions [ef. (10.5)], while 
those in (11.5) are Sj”"*”-measurable, it follows easily from the equivalence of 
(11.4) and (11.5) that (10.8) holds for each B ¢ Sj"*” if and only if (11.1) holds 
for each A ¢ S’”’. Since in this argument m and p are arbitrary, Theorem 11.1 
is proved. 

If, in the argument following (11.3), we replace the last members of (11.2) 
and (11.3) by the respective second members, we obtain instead of Theorem 
11.1 the following intermediate result, pointed out to the author by L. J. Savage. 

THreoreM 11.2. {S)”} is transitive if and only if for cach m, A ¢ S'™”, B ¢ 
S)"*”, and pe P 


(11.6) | x(x) +xe(x) dp = | E (x(x) | So”’)-E,(xe(x) | So”’) dp. 
x x 


It can be seen from the corresponding result for a sequence of statistics that 
the condition that {7’,,} be transitive is a weakening of the following condition: 
“Given Ym, Xom) aNd Ym4, are conditionally independently distributed (p ¢ P; 
m= 1,2,---).” 





456 R. R. BAHADUR 


We record here for later reference the facts that if {S>”} is transitive, then 
for each p ¢ P 


(11.7) E,(g(z) | S°”) = E,(g(x) | Si”) [S, p] 


(m-+1) ° ° 
for every So” ’-P-integrable function g, and 


(11.8) E,(f(z) | So"*) = E,(E,(f(z) | So”) | So”*”) [S, p] 


for every S‘”’-P-integrable f, for m = 1, 2,--- . This:follows from Definition 
10.3 and Theorem 11.1 by an obvious argument (cf. Theorem 5.1). 

For each m, let P>” be the set of all probability measures g on S of the form 
dq = g(x) dp, where p is a member of P and g is a nonnegative S>”*”-measurable 
function. Since g = 1 is certainly Sj”*”-measurable, it is clear that 


(11.9) P cP”, m= 1,2,--- 


THroreM 11.3. {S)”"} is sufficient and transitive if and only if S$” is sufficient 
for the measures Ps” on S™,m = 1,2, +++. 

A heuristic description of the corresponding result for a sequence of statistics 
is: “‘{7',} is a sufficient and transitive sequence if and only if, for each m, T’,, 
is a sufficient statistic for the set of conditional distributions (corresponding 
to pin P and ym41 in Ynys) Of Zem) GIVEN Ymoi .” 

Proor. Suppose first that {S>”} is sufficient and transitive. Consider a 
fixed m, and let A be an S‘”’-measurable set. By hypothesis and Theorem 5.1 
there exists an S,”-P-integrable function, f say, such that 


(11.10) f(x) = E,(xa(x) | SS”) [S, p] for each p « P. 


(m) Y ( 
Let q be a member of P{”’, say dq = g(x) dp, and let C be an S~™-measurable 
set. Then 


giA nC) = xa(x)+xe(x) da, 
_ Xa(x)-xo(x) g(x) dp, 
x4(x)-xe(x)-E,(g(x) | S*”’) dp, 
x4(x)-xe(x)-E,(g(x) | Si”) dp, by (11.7), 
_xa(x)- Ey(xo(x) -g(x) | So”) dp, 
Ey (xa(e) | 0”) -E,(xe(x) -g(x) | St”) dp, 


f(x) -E,(xc(x) -g(x) | $3”) dp, by (11.10), 


f(x) -xclx)-g(x) dp = | f(x) dq, 





SUFFICIENCY 157 


using Lemmas 4.1 and 4.6. Hence, since C ¢ S,” and q are arbitrary, and f is 
(m) 
S,”’-measurable, 


0 


(11.11) f(x) = E,(xa(x) | So”) [S, q] for each g ¢ Po”. 


Since m and A ¢ S“ are arbitrary, we conclude from (11.11) that the condition 
in question is satisfied. 

Suppose now that the condition is satisfied. It then follows immediately from 
(11.9) that {So”’} is a sufficient sequence, and it remains to show that it is 
transitive. Consider a fixed m and an S‘”’-measurable set A, as before. By 
hypothesis, there exists an S;”-measurable function, f say, such that (11.11) 
holds. We observe that (11.9) and (11.11) imply (11.10). Now let p be a member 
of P and B an S}”"*”-measurable set with p(B) > 0, and let dg = cxs(x) dp, 
where c = 1/p(B). Then, using Lemmas 4.1 and 4.6, 


c I xa(x)+xe(x) dp [ xa(x) da, 


= | Bdcalz) | 88”) ag, 
[1@ dq, by (11.11), 
c | H(2)-xa(z) dp, 
c | §2)-Ey(xe(2) |S”) dp, 


we e | E (xa(x) | Si”)-+E,(xe(x) | Ss") dp, by (11.10) 
x 


Since c # 0, it follows that (11.6) holds. We have therefore shown that (11.6) 
holds for each m, p ¢ P, A ¢ S™, and B e S\"*” with p(B) > 0. Since (11.6) 
certainly holds with both sides equal to zero if p(B) = 0, the condition of Theo- 
rem 11.2 is satisfied, and hence {S}”’} is transitive. This completes the proof of 
Theorem 11.3. 

THEOREM 11.4. {S>”} is sufficient and transitive if and only if corresponding to 
each {S‘”}-measurable sampling rule {am} there exists an {S,”}-measurable 
sampling rule {ay} such that 


(11.12) E,(am(x) | SS”) = E,(an(x) | Ss”) [S, p] 


for each m and each p in P. 

The corresponding result for a sequence of statistics reads: ‘‘{7',,} is a suf- 
ficient and transitive sequence if and only if corresponding to each sampling 
rule there exists a sampling rule based on {7',,} such that, for each m and p, 
the conditional probability of the event ‘‘n = m’”’ given y, is the same for the 
two rules.” 





458 R. R. BAHADUR 


Proor. Suppose first that the condition is satisfied. Let k be a fixed positive 
integer and A a fixed S“’-measurable set, and define a,, = O form < k, = xa 
form = k, and = 1 form > k. Then by (10.2) 


{ 


10, m<km>k-+41, 
n(x) = <x4(2), m= k, 
ia xa(2), m=k-+1. 


By hypothesis, there exists an {S>”}-measurable rule {a,,} such that (11.12) 
° > 9° 0. ° ° . . ° °\ 
is satisfied. Since each a,, is a nonnegative function, it follows from (11.12) and 


(11.13) that 

(11.14) n(x) = 0, m<km>k-+1([S, PI. 
It follows from the definition (10.2) of {a,} and (11.14) that 

(11.15) ax(x) = ax(x) [S, P]. 


Since a, is an S)”-measurable function, it follows from (11.12) and (11.13), 
both with m k, and from (11.15) that 


, 


(11.16) Wy(xa(x) | Si) = ax(x) [S, p] for each pe P. 


We observe next that for each p in P 


,» | am(x) dp = it Qm(x) dp, by (11.12) and Lemma 4.1. 
1 x 1 x 


- | (> an(2) ) dp, 
x 1 


= 1, by (11.13) 


so that {a,,} is closed. It follows hence from (11.14) that 
(11.17) ayi(x) = 1 — a(x) [S, P]. 


It follows from (11.12) with m k + 1, from (11.13) with m = k + 1, and 
from (11.15), (11.16), and (11.17) that for 


(11.18) E,(xa(x) | So*?) = E,(Ep(xa(x) | So”) | So°*”) [S, p] each p e P. 


k) m) ) 


Since k and A e S”’ are arbitrary, we conclude from (11.16) that {So"’} is a 
sufficient sequence, and from (11.18) and Theorem 11.1 that {So”’} is transitive 
Now suppose that {S,”’} is sufficient and transitive, and let there be given an 
1 S’” | measurable rule {a,,}. For each m, let fa(x) and f,,(x) be nonnegative 
S<”-measurable functions with 0 < f% < f, , such that for each pin P 


(11.19) fXa) = E,(an(x) | So”) [S, p] 


(11.20 Sm(x) E,(Bm(x) | So”) [S, p] 





SUFFICIENCY $59 


Since 0 S an S Bm S 1 by (10.2), the existence of the functions (* and f,, is 
assured by Theorem 5.1 and Lemma 4.3. Define for m = 1, 2 


falx)/fm(ax f(t) > 0 
(11.21) an(x) = wae Jnfz), Jui) ; 
|1 (say), otherwise. 


Then {a,} is an {S>”}-measurable rule. We shall show that the definition 
(11.19), (11.20), (11.21) of {a2}, together with the transitivity of {Sj”"'}, im 
plies that (11.12) is satisfied by ‘a,} and the given {a,,}. It will be shown inci 
dentally that 


(11.22) E(Bm(x) | So”) = E,(Bm(x) | So”’) (S, pl 


for each m and each p in P. 

Consider the following propositions: 

(i) (11.22) is satisfied for m = 1 and each p in P; 

(ii) if (11.22) is satisfied for m and p then (11.12) is satisfied for the same 
mand p, with m = 1, 2,--- and pe P; 

(iii) if (11.12) and (11.22) are satisfied for m and p, then (11.12) and (11.22) 
are satisfied for m + 1 and p, with m = 1, 2,--- and pe P. 

Clearly, it will be sufficient to establish (i), (ii) and (iii). Since 6, | By 
by (10.2), (i) is obviously valid, and it remains to establish (ii) and (iii). We 
consider arbitrary but fixed m and p. 


Suppose that (11.22) is satisfied for m and p. Then, except on an S-p-null set, 


E,(am(x) | SS”) = E,(ad(x)-Ba(2) | S>”) by (10.3), 


= am(x)-E,(Bm(x) | So”) by Lemma 4.6, 


Am(t)-E,(Bm(x) | So”) by (11.22), 


Am(2) ‘fm(x) by (11.20), 

fm(x) by (11.21), 

Ey (am(x) | So”’) by (11.19), 
so that (11.12) holds. This establishes (ii). 


Suppose now that (11.12) and (11.22) are satisfied for m and p. Then, except 
on an S-p-null set, 
E,(Bm+1(z) | So”*”) 

E(Bm(x) | So"*”) — E,(am(x) | So”*”) by (10.3), 

E,(E,(Bm(x) | So”) | So"*”) — E,(Ep(am(x) | So”) | So”*”) 

by (11.8), 

gin! 

by (11.12), (11.22), 


by (11.8), 
by (10.3) 





460 R. R. BAHADUR 


Thus (11.22) is satisfied for m + 1 and p. It now follows from (ii) that (11.12) 
is also satisfied for m + 1 and p, and (iii) is established. This completes the 
proof of Theorem 11.4. 

The preceding proof shows that Theorem 11.4 remains valid when all sampling 
rules are understood to be closed. This, together with the remark following the 
statement of the theorem, implies Theorem 8.2. 

In the following, for any two subfields S, and S,. , we denote by S; * S; the 
field generated by the class of all sets A; nm Ay with A; ¢ S; and Az « S,. It is 
easy to verify that S, « S, is the smallest field containing S, and S, . 

LemMa 11.1. Let S,, Si, S., and S: be subfields of S such thal Sc &,. 
and S° is sufficient for the measures P on S;,7 = 1, 2. If Ay ¢ S, and Ag € Sy 
wmplies p(A, M As) = p(Ai):p(Az) for each p in P, then Si « S: is sufficient 
for the measures P on S, * S, . 

The corresponding result for statistics is: “If x and y are the outcomes of 
independent experiments, and 7'(x) is sufficient for the distributions of x while 
U(y) is sufficient for the distributions of y, then V(z, y) = [T(x), U(y)] is suf- 
ficient for the joint distributions of x and y.”’ The proof of Lemma 11.1 consists 
in verifying that the class of (S, * S,.)-measurable sets A, such that the con- 
ditional probability function of A given S, * S, and p is the same for each p 
in P, is a field. Then it is verified that this class contains all sets A; nm Ag with 
A; ¢ S; fori = 1, 2 and therefore coincides with S; *« S.. We omit these verifica- 
tions. 

The next and final theorem of this section gives a sufficient condition for 
regularity. 

TreoreM 11.5. Suppose that (a) P is dominated on S‘” (m l, 2, *°: 
and that there exists a sequence S', S’, --- of subfields of S such that (b) SY = 
S' while S"*? = S“™ + S”*' form = 1,2, ++: , and (c) Ae S” and Be S™™ 
implies p(A n B) = p(A)-p(B), (pe P; m = 1,2, +--+). Then the framework is 
regular. 

Proor. It follows from (a) by the results of Section 6 that there exists a 
necessary and sufficient sequence, say {S¥"} 
transitive (cf. Definition 10.4). 

Consider a fixed m. Since SY” is sufficient for the measures P on S‘”’, and 
g°™ ts trivially sufficient for the measures P on itself, it follows from (b) and 
(c) by Lemma 11.1 that SY” * S”*’ is sufficient for the measures P on S“"*” 
Hence, 


m)) 


. We have to show that {S,"} is 


(11.23) ey « ge’ 3" (2, Fi. 


Now consider a fixed p in P. It follows from (c) that for any B e S”™’ and 
any field S;” ¢ S‘” we have 


(11.24) E,(xe(x) | Ss”) = p(B) [S, pl. 





SUFFICIENCY 161 


AnB, with A ¢ SY” and B ¢ S”*'. Then, except on an S-p-null set, 


E,(xc(x) | s‘”) = Ey(xa(x)-xa(x) | ss”) 
x(x): E5(xe(x) | S*”’) by Lemma 4.6, 


x(x) - p(B) by (11.24), 
xa(x)-E,(xa(x) | S”) by (11.24), 
E,(xa(x)-xa(x) | Sx”) by Lemma 4.6, 
E,(xc(x) | S¥”). 

Thus the definition of C implies 

(11.25) E,(xe(x) |S”) = Ep(xe(x) | Sx”) [S, p]. 


Since, as is easily seen, the class of all sets C e S for which (11.25) holds is a 
field, we conclude that C « SY” « S”*’ implies (11.25). It now follows from 
(11.23) that C e SY*? implies (11.25). Since m and p are arbitrary, {SY} isa 
transitive sequence. This completes the proof. 

The following is a statement of Theorem 11.5 in the terminology of Section 8: 
“Suppose that, for each m, each of the possible distributions of 2;m) admits a 
probability density function with respect to a fixed o-finite measure \,m) [con- 
dition (a)], and that for each p in P, 2, x2, +++ is a sequence of independent 
chance variables [conditions (b) and (c)]. Let 7;, T,, +--+ be a sequence of 
statistics on X(,) , X(2) , --- respectively. If {7’,,} is a sufficient sequence, then 
for each m there exists a field 79, of subsets of the range of 7’, such that 7’, 
is a measurable transformation of (X(m), Sim) into (Ym, 7%), and such that 
with each T, regarded as this measurable transformation, the sequence {7',,} is, 
in effect, necessary, sufficient and transitive.’ We are unable to state the con- 
clusion of the theorem entirely in terms of a sequence of statistics because the 
exact relations between statistics and subfields are not known at present. 

Any framework in which P consists of only one measure is regular. Conse- 
quently, the condition of Theorem 11.5 is not necessary for regularity. On the 
other hand, Example 9.5 shows that not every framework is regular. 

12. Concluding Remark. It is instructive to verify in detail that the results 
concerning statistics and measurable transformations which are described in- 
formally in Sections 1, 8 and 11 do follow from the theorems concerning sub- 
fields given in the formal exposition. 

Acknowledgments. The author is indebted to the referees for several 
helpful suggestions. The author is indebted, in particular, to L. J. Savage whose 
suggestions, stimulating criticism, and encouragement have made a substantial 
contribution to this work. 

REFERENCES 


{1] P. R. Haumos anp L. J. Savaae, ‘Application of the Radon-Nikodym theorem to the 
theory of sufficient statistics,’’ Ann. Math. Stat., Vol. 20 (1949), pp. 225-241 





R. KR. BAHADUR 


EE. L. Leymann anv H., Scuerr®’, ‘Completeness, similar regions and unbiased estima 
tion. Part I,’’ Sankhyd, Vol. 10 (1950), pp. 305-340. 

J. L. HopGes anp E. L. Lenmann, ‘Some problems in minimax point estimation,” 
Ann. Math. Stat., Vol. 21 (1950), pp. 182-197. 

A. Dvorerzky, A. Wap, anv J. Wo.rowir1z, ‘Elimination of randomization in 
certain statistical decision procedures and zero sum two-person games,’’ Ann 
Math. Stat., Vol. 22 (1951), pp. 1-21 

. BLACKWELL, “‘On a theorem of Lyapunov,’’ Ann. Math. Stat., Vol. 22 (1951), pp 
112-114. 

i. L. Leumann, ‘‘A general concept of unbiasedness,’’ Ann. Math. Stat., Vol. 22 (1951 
pp. 587-592. 

. J. SavaGce, The Foundations of Statistics, John Wiley and Sons, New York (1954), 
Chaps. 7 and 12. 

>. R. Rao, “Information and accuracy attainable in the estimation of statistical 
parameters,’’ Bull. Calcutta Math. Soc., Vol. 37 (1945), pp. 81-91. 

D. Buiackwe.., “Conditional expectation and unbiased sequential estimation,”’ 
Ann. Math. Stat., Vol. 18 (1947), pp. 105-110. 

P. R. Hatmos, Measure Theory, D. Van Nostrand Company, Inc., New York (1950 

H. Rossrins, ‘Mixture of distributions,’’ Ann. Math. Stat., Vol. 19 (1948), pp. 360-369. 

G. Ervine, “Sufficiency and completeness in decision function theory,”’ Ann. Acad 
Sci. Fennicae, Ser. A. I., Math.-Phys., No. 135, (1952), Helsinki. 

A. WaLb, “Statistical decision functions,” John Wiley and Sons, New York (1950). 

A. Bercer, ‘‘Remark on separable spaces of probability measures,’’ Ann. Math. Stat., 
Vol. 22 (1951), pp. 119-120. 

J. L. Doon, Stochastic Processes, John Wiley and Sons, New York (1953 

M. A. Grrsaick, F. Mosre ver anno L. J. Savaace, ‘“‘Unbiased estimates for certain 
binomial problems with applications,’’ Ann. Math. Stat., Vol. 17 (1946), pp 
13-23. 


D. BLACKWELL AND M. A. Grrsuick, “Theory of games and statistical decisions,” John 
Wiley and Sons, New York (1954), Chap. 8. 





ON A STOCHASTIC APPROXIMATION METHOD 
By K. L. Cxuune! 


Cornell University and Syracuse University 


1. Summary. Asymptotic properties are established for the Robbins-Monro 
[1] procedure of stochastically solving the equation M(x) = a. Two disjoint 
cases are treated in detail. The first may be called the ‘‘bounded”’ case, in which 
the assumptions we make are similar to those in the second case of Robbins and 
Monro. The second may be called the ‘‘quasi-linear” case which restricts M (x) 
to lie between two straight lines with finite and nonvanishing slopes but postulates 
only the boundedness of the moments of Y(x) — M (2) (see Sec. 2 for notations). 
In both cases it is shown how to choose the sequence {a,} in order to establish 
the correct order of magnitude of the moments of x, — 6. Asymptotic normality 
of a.?(x, — 0) is proved in both cases under a further assumption. The case of a 
linear M(x) is discussed to point up other possibilities. The statistical signifi- 
cance of our results is sketched. 


2. Introduction. Let M(x) be a fixed but unknown function and a@ a given 
(known) constant such that 
(1) M(x) = a 


has a unique (unknown) root z = @. Suppose that to each value x corresponds 
a random variable Y = Y(z) with distribution function Pr[Y(x) < y| = H(y | x), 
such that 


M(z) = / y dH (y x) 


is the mathematical expectation of Y for the given z. 
The Robbins-Monro procedure is defined as follows. Let {a,},n 2 1, be a 
fixed sequence of positive constants such that 


xz 


(2 La, = « da’, « 

nel n=l 
We define a nonstationary Markov chain {z,} by taking 2, to be an arbitrary 
constant and setting recursively 


Zat+i = In + An(a Be Yn), n2= . 


where y, is a random variable whose distribution function, for given 2, , --- , 
In, Wi, *** »Yn-r,i8 Ay | x). The moments of x, — @ will be denoted as follows: 


bY = El(z, — 6] bf =b, 
» = El|z,. — 6’). 


Received 8/24/53. 
1 Research under contract with the Office of Naval Research 


463 





464 K. L. CHUNG 


Under certain assumptions regarding the nature of the functions M(-) and 
H(- | -), to be specified in a moment, Robbins and Monro showed that b, — 0, 
namely that z, tends to @ in mean of order 2, for every sequence {a,} satisfying 
(1), thus providing a stochastic solution of the equation (1). 

Robbins and Monro made two overall’ assumptions: namely that 
(A) M(x) § a according as x = 6 (our Assumption (0) in Sec. 4); and 
(B) the random variables Y(x) are uniformly (in x) bounded with probability 

one (our Assumption (III) in Sec. 4). 

Furthermore they needed either one of the following two sets of conditions: 
(i) inf,.e| M(x) — aj26>Q0; 

(iia) M(x) is nondecreasing and M’(@) > 0. 

Wolfowitz [2] weakened the overall assumptions by keeping (A) but assuming 
only that M(x) is bounded and that Y (x) has a bounded (in x) variance. Further- 
more he needed either the condition (i) above or 
(iib) M(z) is strictly increasing in a neighborhood of x = @ and is bounded 

away from a outside every such neighborhood. 

Under these assumptions he proved that x, tends to @ in probability. Later 
several authors proved that z, tends to @ with probability one, under conditions 
yet unknown to us. We are not concerned with this question here. Very recently 
L. Schmetterer [7] gave some upper bounds for b, , under assumptions which 
are essentially those of our second case; see footnote 5. 

In this paper we shall study the finer properties of the process {z,}, especially 
with regard to the moments of z, — @ and the limiting (nondegerrerate) dis- 
tribution of z, — @, suitably normalized. We shall not deal with the case (i), 
but shall treat two different cases. 

Our first case (Sec. 4) requires a set of conditions which is similar’ to that in 
the second case of Robbins and Monro. In addition to their overall assumption 
we assume that 
(ii) M’(@) > 0 and M(z) is bounded away from a outside every neighborhood 

of x = @; (assumptions (I) and (II) in Sec. 4). 

With these assumptions and a suitable choice of {a,} we can obtain upper 
bounds for the absolute moments 8” = E[| z, — @|"] (Theorem 1). An impor- 
tant consequence, much used thereafter, is given in Theorem 2. In order to 
obtain lower bounds for Bo (Theorem 3) we need the new assumption ((1V) 
in Sec. 4) that the variance of Y(x) is bounded below uniformly in z. It is inter- 
esting to note that our choice of a, is a, = 1/n'‘, with an e which has to be 
greater than some positive constant depending on M(-) and H(- | -), but fortu- 
nately always compatible with « < 4. The upper and lower bounds for 8°) are 
at first not of the same order of magnitude, but they lead to Theorem 4 which 
in turn sharpens the bounds to their correct order (Theorem 5). 

The question arises what happens if an intransigent statistician refuses to use 


our prescribed (range of) ¢ and insists on using the simpler a, = c/n, as suggested 


* This adjective need not be taken literally. 
* In fact, Theorems 1 and 2 below are proved under weaker conditions than theirs. 





STOCHASTIC APPROXIMATION METHOD 


by Robbins and Monro. Our answer is that our method still leads to some esti- 
mate, in fact that 8%” has an order between 1/n and 1/(logn)° if ¢ is chosen 
sufficiently large (where c’ depends on c and tends to infinity with it), but we 
do not know even if it has a definite asymptotic order. 

Returning to the moments, it seems vain hope to obtain a precise asymptotic 
formula for them without further hypotheses. We shall content ourselves with 
the “obvious” by strengthening our last assumption to requiring that Y(x) has 
a constant variance independent* of x. With this added force (Assumption (V) in 
Sec. 4) our method works smoothly and we finish with an asymptotically normal 
distribution for a‘?(z, — 6) (Theorem 6). 

We now turn to our second case (Sec. 5) which is disjoint from the first one. 
Here assumption (B) is replaced by the weaker one that 
(C) Y(x) — M(x) has bounded (in x) moments up to order p, where p is an 

even integer. 

Assumption (A) is kept but (ii) is replaced by 
(iii) M’(@) > 0; M(x) is bounded in any finite interval; and 

0< tm 2! = im BO 
jzj~o «= jzi7e «=o 

Admittedly this last condition is pretty strong and our only excuse is that of 
inability, and an invitation to weaken it by a better method. Our choice’ of 
{a,} isnowa, = c/n withe > 1/2K, where K = inf, ,{M(x) — a)/(x — 0] > 0. 
With the assumptions (C) and (iii) we can prove that 8% is at most of the order 
n" for0 < r S p (Theorem 7). If p = 6 in (C) we can prove that 6 is exactly 
of the order n~”” for 2 < r S p (Theorem 8). If we can take p = @ in (C) and 
further if Y(x) has a constant variance independent of z, then just as in the 
first case’ we can prove asymptotic normality of a\?(z, — 0) (Theorem 9). 

The method we use in discussing both cases is elementary and depends on 
some simple analytical lemmas which we collect in Section 3. 

In Section 6 we discuss by a different method the case where M(-) is a linear 
function. Under the assumption that Y(r) — M(z) has a fixed distribution 
function independent of x, the problem is reduced to a classical problem in proba- 
bility theory. Various easy conclusions are then drawn which show that z, — @ 
may have an asymptotic distribution which is stable but not normal, or it may 
have no asymptotic distribution whatsoever. The main interest of these ob- 
servations is to serve as a foil to our previous results. 

In Section 7 we discuss briefly the statistical implications of our results. 

4 The following weaker assumption suffices. The variance E[(Y (x) — M(z))*] as a function 
of xz is continuous and nonvanishing at z = @. Since by (III) it is bounded in z and by 
Theorem 1 below z, — @ in probability it follows that E[(y, — M(z,))*] > «7 as n—> © 80 
that (4.15) implies that e, — o? and this is all we need. This weaker assumption is culled 
from an unpublished MS. by J. L. Hodges, Jr. and E. L. Lehmann. 

5 Schmetterer [7] gave upper bounds for b, for a, = n-*, 0 < « < 4; a, = n™ (in this 
case the order of magnitude of the upper bound depends on K); and a, = cn™ with 


~ 


c > (2K)~. The last case is covered by Theorem 7 below. 





166 K. L. CHUNG 


The formulas in each of the 7 sections are numbered separately. A formula 
in the same section is referred to simply by its number; a formula in a previous 
section is referred to by prefixing the section number, for example, (4.6) is 
formula (6) in Section 4. 


3. Lemmas. In this section we state and prove our principal mathematical 
tools. They seem to be new and can be further elaborated, but we state only 
what will be needed. The c’s are numerical constants. 


Lemma 1. Suppose that {b,},n 2 1, is a sequence of real numbers such that 
forn = %, 


(1) Dass <(1 - 2) b+ _ 
n npr 


wheree > p> 0, > 0. Then 


l | 
(2) b, S ——_ — + 0( + ). 
c— pn npr n° 


Proor. There is no loss of generality if we take no = 1. We have 


] c ) ] c ( ] ] ) c—?p ( ] ) 
= —{1- —{—-— - -} = + 0 g 
(n + 1)” ( n/n? nPr n? (n + 1)? nr nr? 


Hence for some c, > 0, 


; , | c l Ce 
(3 —-_ | ; _ (1 _ ) -| + ; 
, nPtt c— pL(n + 1)? n/n” t nrrs 


Similarly but more roughly, for some c; > 0, 


C2 _- | ( c ) ] 
C2 _ — : 
ner? “LEL(n + 1)?! nj/ n?* 


Using these inequalities in (1) and rearranging terms, we obtain 


Cy I C3 < (| e) [ i i Cs 

- —_-—_—__—_ — ——————  & _ _ —- — —_ . 
meee c— p(n+ 1)? (n + 1)?* n c— pn’ nt 
Let the quantity on the left side be denoted by b,,, . If for some n > c we have 
b,, < 0, then this is true for all subsequent n, namely 


Cy l Cs 


b, S - —., 
c—pn> nti 


Otherwise for every n > n; > c we have 


—— c \ % 
@6<t. & 5. Il aa m/ v =): 


In either case (2) is true. 


Lemma 2. Suppose that \b,},n > 1, is a sequence of real numbers such that for 
> Ne, 





STOCHASTIC APPROXIMATION METHOD 


wherec > p > 0,e, > 0. Then 


2s + 0( iy, 3). 
c— pn nPrl n° 


Proor. The proof is entirely similar to that of Lemma 1. The point is that (3) 
may be changed into 


Cy Cy 


> 


neti“ ec—p 


n 


’ 


LemMMA 3. Suppose that {b,.} 
for n= Nm, 


(4) bar 2 (1 —- “Vb, +< 
n* n' 

where0 <s < 1,8 <t,ce >0,c¢ > 0. Then 
lim n‘~*b, = 2 
no c 

Proor. We may take np = 1. We have 
I 
(n + 1)" 


Hence we have 


ae 
n~ eL(n+ 1) 


Using this inequality in (4), we obtain 


Cy i c Cy 
Dnt Te oo & | — ) b, _ — ). 
c(n + 1)** n cn 


_ I/s “ §« ° ° 
If for some n > c’* we have b, 2 c,/en’ “, then this is true for all subsequent n 
° 1/ 
Otherwise for every n > n,; > c'* we have 


: saad 
= a = On 7 . I] (: ¥ “) sales (<<) 
, 1 m= 1) 


for every q > 0. The lemma follows in either case. 
LemMMA 4. Suppose that {b,},n 2 1, is a sequence of real numbers such that for 


=— no, 


/ 
c 


/ c ) 
bas s l — - l n + 
a \ n* n' 


where0 <e<1,8s<tjc,2¢> 0, c’ > 0. Then 


/ 
i t— € 
lim n “b, < 


n—-@ c 





468 K. L. CHUNG 


Proor. We have, if c” is any number > ¢’, 


c el l | € af “s) l ] < c” | 1 
n' ~ ce, L(n + 1) n'}nt*|}™~" ¢eL(n + 1)" 


for all n > no(c”). Using this inequality in (5), we obtain 


oe c” bs if ol (1 a 4) ars c” ] ). 
cums iy" \ n env 


The rest follows as in the proof of Lemma 3. 


4. First case. In this section we treat a case which is essentially the second, 
‘“‘more interesting,” case of Robbins and Monro. The various assumptions needed 
will be listed below. Not all of them are used in every theorem we shall prove. 

In the following, Ko, K,, +--+ are positive constants which depend only on 
the nature of the functions M(-) and H(-|-), and K;, Ky, «+: are positive 
constants which depend moreover on the choice of {a,}. If these constants happen 
to depend also on some new parameter 6 (say), this dependence will be indicated 
by the usual parentheses. They are numbered in order of appearance. We use 
also the customary O and o notations, as we have already done in Section 3, 
it being understood that the constants involved may depend on M(-), H(- | -) 
and {a,}. The initial value x, of the process is supposed to be a fixed constant or 
at least a random variable which is bounded by a fixed constant with proba- 
bility one. 

AssuMPTION (0). M(-) is a Borel measurable function; M(@) = a, and 
(x — 0)(M(x) — a) > O for all  # 6. This assumption will be used through- 
out the paper and will not always be explicitly mentioned. 

AssumpTion (I). We have M’(@) > 0, namely, as « — 6-0, 


M(x) = a+ a(x — 6) + o(| x — @)) 0O< am < @, 
AssumpTION (II). For every 6 > 0 we have 


inf | M(x) — a| = K,(d) > 0. 


|2—0|>8 
AssuMPTION (IIT). For all z we have 
Pr (| Y(z) — a| S Ki) = 1. 
AssumpTION (IV). For all z, we have 
E{(Y(x) — M(z)! = K; > 0. 


In this section we set forn = 1 


(1) a, = 1/n*“, 0<e<}, 


where ¢ is to be chosen later. 
We record some simple consequences of the assumptions and the choice 
of {a,}. 





STOCHASTIC APPROXIMATION METHOD 


First, it follows from (III) that 
(IIIa) | M(x) —a| S Ky 


and that all moments of Y{x) are bounded by constants K which depend on the 
order of the moment but not on z. 
The Robbins-Monro procedure yields 


n—1l 


In = 1 + >» ala — yx). 


k=l 


From (III) we have with probability one 


1 , 
(2) lan -O0|S|my—0|+ Ky z < Ks n‘. 
1 


ki € € 


From (I) it follows that to every y,0 < y < 1, there corresponds a é = 6(y) 
such that 


| M(x,) — a| = ya, | 2, — 0| if|z, — 60| Ss 6. 
From (II) and (2), it follows that the conditional probability is one that 


| M(atn) — a| = [Ko(8)e/Kyn']| x, — 0| if |z, — 0| > 6. 


Namely, the inequality for | M(z,) — a| holds almost everywhere on the set 
| 2, — 6| > 6. Together we have with probability one 


(3) | M(x.) — a| = Kan™*| 2, — 0|. 


This constant K, is of extreme importance in the analysis below. Of course, 
as given in (3), it does not depend on n. However, for n — © we can determine 
its asymptotic value as follows. For a given y, 0 < y < 1, let do(y) be the su- 
premum of all 6 such that 


|M(z)—-a|2yva|x—06| for|x — @| Ss 6. 


Then for n = no(v, 6’, x; — 8) the K, in (3) may be taken to be Ko(do(y))/K, — & 
where 8’ > 0 is arbitrarily small. If lim ,.odo(y) = 59 , the K, in (3) may be taken 
asymptotically as n — © to be Ko(d9)/K, . 

Finally, we note that from (I) and (IIIa) it follows that 


(4) | M(x) —a| S$ Ks|x— 6}. 


THeoreM 1. Suppose that the assumptions (1), (II) and (III) are satisfied. 
—(1—«) 


=n , with 1/2(1 + Ky) < « < }, then for each real r > 0, we have 


2) 


(5) Bg” < Ke(r)n7'?@ 2«) 
Proor. We write as in [1] 
dn 7 E\(z, ani 0)(M (x,) c~ a«)], ca, = El (yn = a)’. 


The proof is divided into three stages: (i) r = 2, (ii) all even integer r, (iii) all 
real r. It is obvious that (iii) follows from (ii) by Lyapunov’s (or Hdélder’s) 
inequality. 





470 K. L. CHUNG 


(i) The following equation is given in [1]: 
(6) Dass = b, be i 2a,d, + Cn n- 


By (3), we have, using (0), d, 2 Kyen‘b, . By (III), we have e, S K;. 
these in (6), we obtain 


9K q P 9K ce a 
(7) bari S € eS )o, ee (: ~ et) be + = 
: - n 


n* n'-* nit 


where we have put, once for all, 0 < } = 1 — 2e < 1. If € > 1/2(1 + Kg), 
then 2Kye > A. Applying Lemma 1 to (7) with c = 2Kye, «, = Ki, p h, we 
obtain 


K; 1 1 | ) 
b, S = — O ime T, 
“= 2(1 + Kye — 1 + ( t nti 


Thus (5) is proved for r = 2 


(ii) We use induction on even r. Suppose then that r is even and that 


(5 bis) a.” s Ky(t)n — 2a ts tr — &. 
Recalling that 2,4; — 6 = z, — 6+ a,(a — y,), we have, as a generalization 
of (6), for all integer r 2 1, 


bei —_ | f iT. = 0 — a,(y a a) }" dH(y In | 


(8) 


bY? = ran E\(cn — 8)" "(M(an) — a] + DU (-1)' ({) J 


where 


(9) J, = Jir) = an,El(z, — 0)" “(yn — a) . 


By (IIL) we have | J/;| S Ayn ‘rep Hence by the induction hypothesis 
(5 bis) we have | J,;| s Kgn ““*’’”. Therefore we have 


> , 1)! (") ils Ky 
t= 2 t n 


L+rA/2° 
By (3), we have since r is even (using Assumption (0)) 
E(x, — 6)" "(M(zx,) — a)] = Ksen™“b’”. 
Using these inequalities in (8), we obtain 
(10) bhi S (lL — rKye/n)by” + Kion 


Our choice of « makes rAye > rdA/2. Applying Lemma | to (10) with ¢c = rKe, 
c, = Kw, p = rd/2, we obtain 
») yt 
bo” < 2K w l : 
r[2(1 + Kye — 1] n” 





STOCHASTIC APPROXIMATION 


METHOD 
This completes the induction, thus proving (5) for all even integers r. 


THEoREM 2. Under the same hypotheses as in Theorem 1, we have, for every 
5>0,r2O0andq>0, 


471 


/ la, — 6|' dPr = O(n‘). 
|Zy—O|>6 
Proor. By (2 


integer s, 
r O\>8 


It remains only to choose s so that s\ — re > q. 
THEOREM 3. 


a, =n 


) and Chebychev’s inequality, we have for every positive 


\2n—-0|'dPrs Kun" Pr (|\z,-@|>5) 3 Kin" Bo? /8* = O(n" 


Suppose Assumptions (1), (III) and (IV) are satisfied 
“© with 0 < € < 4, then for each integer r = 2, we have 
(11) 


: (1—e)r/2 
lim n 


(r) , 
re oe Ky. 
ne 


Proor. By Lyapunov’s inequality we need only prove (11) for r = 2. By 
(4), we have d, S Ksb, . By (IV), we have e, = Kz. Using these in (6), we 
obtain 


bar & (1 — 2K" * )d, + Kn *™. 


Applying Lemma 3 with s = 


l—¢t=2—2.c= 


= 2K,,¢ = K2, we obtain 
: l—e r , 
lim n'“b, 2 K2/2Ks. 


no 


Remark. If in Theorem 3 we choose a, = c/n with a sufficiently large c, we 
obtain b, 2 Ki;(c)/n. We do not need this result in this section; but see Section 5. 
Tueorem 4. Suppose thal the Assumptions (1) to (IV) are satisfied. If a, = n 
with 1/2(1 + K4) < « < 4, we have 


~(1—e) 


lim (d, b,) = a, = M’(6). 


Proor. Given any small n > 0, there exists a 6 = 6(n) such that 


M(x) — a—a;(x — 6) 


Sn|z-— 6 
Hence 


(x, — 0)(M(x,) — a) dPr 


ay | (tr, — 6)” dPr + vf 


(x, — 0)° dPr 
s 


n 


= ab, - a: | (x1, — 0)? dPr + n”b, 
r,~9'\>8 





472 K. L. CHUNG 

where | 7” | S | 9’ | S ». On the other hand, by (IIla) and Theorem 2 
[ (x, — 0)(M(z,) — a) dPr = K, | an — 6|dPr = O(n‘) 
[Zn 9 |>8 


|2_—O|>8 


for every g > 0. It follows from Theorem 3 that 


| (es — OMe.) — a) dPr « ofb.). 
|Z_—9|>8 


Combining these results, we obtain d, = ab, + 7”bn + 0(b,). This proves 
Theorem 4 since 7 is arbitrarily small. 
Tueorem 5. Under the same assumptions as in Theorem 4, we have 


Pr se . 1— - iid er 
(12) K2/2a, S lim n“‘b, S lim n ‘by S Ky/2a. 
oo no 
Moreover for every integer rea?2 


— ©) 


, : (1— /2 : (1l—e«)r/2 r \ 5 
(13) 0 < Ki(r) Ss lim n°"? B® <¢ lim n°"? B® < Ky(r) < @. 


Proor. By Theorem 4, for any small 6 > 0, if n > no(6) we have 
(a; — 5)b,n S d, S (a + 5), . 
Recalling that we have Ke S e, S K;, we have for n > no(6) on the one hand, 
baa S (1 — 2(a, — 6)n™*)b, + Kin ?*; 
and on the other hand 
baa S (1 — 2(a + 5)n™)d, + Kan? ™, 


Lemmas 3 and 4 yield 


K2,/2(a, + 46) S lim n'*b, < lim n'‘*b, S Kx/2(m — 8). 
This proves (12) since 6 is arbitrary. The left half of (13) follows now from 
Lyapunov’s inequality. The proof of the right half of (13) is entirely similar to 
part (ii) of the proof of Theorem 1, modified according to the proof of Theorem 
3. We find that, if r is even and if lim,., nb” S Kywfor2 St <r — 2, 
then for alln > n,(6, €) we have 


( —(1l—e) (r) —(1—e) (1 2 
baw S (1 — r(a; — 8)n OY + Ky newer” 


where K,; does not depend on ¢. This explains the reason why the constants 
Ky(r) and K,s(r) do not depend on e. Naturally, they are equivalent to 


Ki(r)n-9-"? ¢ pS s Kie(r)n~-?"? 


since in this form ¢« enters through the absorbed error terms. 
It is possible to give explicit bounds for the constants Ky(r) and K,(r) by 
proceeding inductively. However, it seems more interesting to study a case 





STOCHASTIC APPROXIMATION METHOD 473 


in which there is an asymptotic formula for b, , in other words, where the limits 
in (13) may be replaced by a unique limit. For this purpose we shall make a 
further assumption which strengthens (IV). 

AssumpTIon (V). For all z 


E{(Y(z) — M(z))’] = o’ > 0 
where o° is a constant which does not depend on z. This assumption states that 
all the distributions H(y | x) have the same (positive) variance. Even the much 
stronger assumption, that there is a fixed distribution Ho(y) such that H(y | x) = 
H)(y — M(zx)), seems reasonable in many applications. 


THEOREM 6. Suppose that the Assumptions (I), (II), (III) and (V) are satis- 
fied. If a, = n~°~° with 1/21 + Ky) < € < }, then we have for every integer r = 1 


(0 ifr = 2s —1 
lim n fs 
n-v0e \(o?/2a1)*(2s — 1)(28 — 3)... 3-1 ifr = 2s. 


(l—«)/2 


(1—e)r/2 7 (r) 
a. = 


Consequently the random variable n (2, — 0) tends in distribution to the normal 
distribution with mean 0 and variance (a”)/ (2a). 
Proor. (i) r = 1. We have 


(14) bw = El(x, — 0) — anlyn — a)] = b& — a, E[M(z,) — al. 
By (I), there exists a 6 = 6(a;/2) such that 


| (M(z.) — a) dPr = (a +) (1, — 6) dPr 
|2n—0 <8 


J \zq—0|<8 
(cr + QP — (ato) | (ee — 0) dPr 
where | 7’ | S a;/2. Hence by Theorem 2, for every g > 0 we have 
| pcg MG) = a) dPr = (cu + afb + O(n’), 


On the other hand we have by Theorem 2, 


| (M(x,) — a) dPr| S$ K, Pr (|, — 6| > 8) = O(n’). 
|2_—8|>8 


Together we have 
E(M(a,) — a] = a(n)by” + O(n™) 
where a(n) = a; —(a;/2) sgn b’” = a,/2. Using this inequality in (14), we obtain 
bo, S (1 — a(n)n bw + Kon. 


Applying Lemma 4 with s = 1 — ¢,t = q, G, = a(n), c = a/2,¢’ = Ky we 
obtain b°” = O(n~”) for every p > 0. 
(ii) r = 2. We have, by (V) 


(15) en = El(y, — M(z,))” + (M(z,) — a)’] = o° + E[(M(z,) — a)’}. 





474 K. L. CHUNG 


By (4), E{(M(zx,) — a)’] S Kib, . Hence given any 6 > 0, there exists no = no(4) 
such that ifn > no, 


Sot. 


Moreover by Theorem 4, this no may be also chosen so that ifn > no 


(a; — 6)n, Sd, S (a, + 5)b,. 


Using these in (6), we obtain 


9 Pi oon oil 
(1 7 Hor + 2) b, + = 39 = Day 
nim 


n?-*« 


Applying Lemmas 3 and 4 we obtain 


o = 6 . - - ang o 6 
. -S limn’‘b, < limn' ‘bd, s j + —. 
2(a, + 5) ae n—o 2(a; — 4) 
Since 6 is arbitrary it follows that lim,.. nb, = o° /2an . 
(iii) Induction. Let r be an integer >2. It follows from (I) that, for every 
n > 0, there is a 6 = 6(n) such that 


E{(r, — 6)’ (M (an) —a)] = (a, + 7’) | (a, — 0)’ dPr + O(n") 
Z,—O0|\ 58 


= (a, + 7’)dS? + O(n™) 


where | 7’ | S » and gq > 0 is arbitrary, by Theorem 2. Moreover, by (V), (4) 
and Theorem 5 


El(tn — 9)’ "(yn — a)] = Elfo® + (M(an) — @)*}(t, — 0) 


sob, +06.) =obs ” +0n °°” 
Hence by (9), 
2(1—e 


ao by" . + O(n 


andif3 Stsr 
| Je | os O(n tal “3° ) _ On r+ojvd orn ol On (r +3) (1—«)/2 


Substituting these estimates into (8), we obtain 


‘ ’ ‘ (r) r 2, (r—2) 2(1—e) , 
(16) bw, (1 — r(a, + n’)n ey,“ + (5) os ieee + O(n 


Y : : . . —2)(1—e) /2y (r—2 
Now assume as our induction hypothesis that lim,.. a. 
It follows that if n > no(n), 

(r) 


bet = (L — ray + 9’)yn bY? 4 (") (o° + 9”)B,gn OO? 


+ O(n" 


where | 7’ | S n and | n” | S 7. A fast application of Lemmas 3 and 4 yields 


9 


. I—«) r/2) ( r 2 , 2 , 
ee iis ( o B,-s/ra, = (r — 1) o B,_2/2a,. 


n—-o 





STOCHASTIC APPROXIMATION METHOD 475 


From (i) and (ii) above we have B, = 0 and B, = o /2a, hence it follows now 
by induction that for each integer r = 1 


0 ifr = 2s — 1, 
(a”/2ay)*(2s — 1)(28 — 3)--+ 3-1 ifr = 2s. 


This proves the first part of Theorem 6, the second part is a well known con- 
sequence. We may remark, for the benefit of future textbook writers, that here 
is another instance in which the method of moments seems to apply more easily 
than the more modern method of characteristic functions. (see however Sec. 6). 
This method of moments is not mentioned in several books on probability and 
statistics. 


B, = 


5. Second case. In this section we treat the second case described in Section 
2. Assumptions (0), (I) to (V) are as stated in Section 4. Others are 

AssumpTION (VI). The function M(x) is bounded in any finite interval of z, 
and we have 


a) jz|—o 
| 


0< lim M(x)/z S lim M(a2)/z < @. 
AssumPTION (VII). For a certain even integer p 2 2 we have 
E\((Y(x) — M(zx))"] S Kun < @. 
We note that (1), (II) and (VI) imply that 
(1) K\|z—06|s|M(z) —a| Ss Kn|z — 4. 


The constant K > 0 will figure prominently in what follows and so we omit its 
subscript. We also introduce a new constant for the upper bound on the variance 


of Y(z): 
(2) E{(Y(xz) — M(x))’] S Ky < @. 


The existence of such a constant is of course implied by (VII); in fact Koj < K*4?. 
In this section we set n = 1 


a, = c/n QO<c< @ 


where c is to be chosen later. In contrast to Section 4, the initial value 2; may 
now be any random variable, bounded or not. The analysis in this section is 
quite similar to and somewhat simpler than that in Section 4, and we shall be 
more brief. 

THeoreM 7. Suppose that Assumptions (1), (11), (VI) and (VIII) are satis- 
fied.® If a, = c/n with ec > 1/2K, then for every positiver S p we have 


(3) lim n”’’s“ < B, < « 


no 


* If we assume that M( - ) is continuous everywhere then these four assumptions may be 
replaced by (1) and (VII). 





476 K. L. CHUNG 


where 
B, s [rt/2""(r/2)!][(Kese"/(2Ke — 1)]"” 


for an even integer r. 

REMARK. To minimize the above bound for B, we should choose c = 1/K, 
giving B, < [r!/2"’(r/2)!|[Kx,/K’|"". However, since K is unknown it is better 
not to fix c. 


Proor. By (1) we haved, = Kb, . On the other hand, by (1) and (2), 
én = El(yn — M(2,))*] + El(M (an) — a)"] S Ke + Kid, . 
Using these in (4.6) we have 
baat S&S (1 — 2Ke/n + Kinc’/n’)bn + Kye’ /n’ 
< (1 — (2Kc — n)/n)b, + Kye?/n’ 


forn > no(n, c), where » > 0 is arbitrarily small. Let c > 1/2K. Applying Lemma 
1 we obtain 


Kxync ] l l 
4 Ss 5 + 0(—+ 
~ 2Ke—y7—I1n nr nike 
Since c may be chosen arbitrarily close to 1/2K, and 7 arbitrarily small, (3) is 
true for r 2 with B, = Kye"/(2Ke — 1). 
Now let 7 be even and assume that 


t/2,(t) 
limn''B, sv B, < 


for 2 Ss ¢t S r — 2. Consider the J,,2 S t S r — 2, defined in (4.9). We have, 
by (1) and (2), 


Je = en E|| a, — 0|"7E{ (yn — M(an))’ + (M (an) — a)’ | 2n}] 


») 


2 ’ 7 (r—2 2 (r) 
Ss cn {Kab , Kobn}. 


Similarly, if 3 < ¢ < r — 2, we have, using (a + b)' Ss 2‘(a‘ + Db‘) 


| Je | Ss 2'c'n™"[KuBe ; + Kb) = O(n ‘—. 
On the other hand, we have by (1), 


E(x, — 0)"(M(z,) — a)] = Kb%?. 


Using these in (4.8) we have 


bb), s (1 — rKe/n + O(n™”))by? + (5) Kac'B,.wn "?*? + O(n *™") 


Applying Lemma 1 we obtain 


lim n*b& < r(r — 1)Kyc'B,_2/(2rKe — 1). 


n-*@ 





STOCHASTIC APPROXIMATION METHOD 


Thus (3) is inductively true with 
B, S [(r — 1)Kze’/(2Ke — 1)|B,2. 


This yields the stated bounds for B, and proves (3) for every even r S 
rest of the theorem follows from Lyapunov’s inequality. 

THEOREM 8. Suppose that the Assumplions (I), (II), (IV), and (V1) are satis- 
fied; and (VII) is satisfied with a certain p = 6. If a, = c/n with ec > 1/2K then 
we have 


p. The 


> 0. 


: ~ Ke c 
lim nb, 2 =———— 
x c=— l 


n-* 


REMARK. Note that a; = K so that 2a,c — 1 > 0. 
Proor. As in the proof of Theorem 4, given any 7 > 0, there exists a 6 = 4(n) 
such that 


/ (2, 6)(M (2,) — a) dPr = (a, + n’)bn — a | (1, — 8) * dPr 
Zn—O\ <5 


|2n—0|>8 


where | n” | 


< n. By Theorem 7 we have 


(4) / (tn — 0)? dPr s 8°?” ob?) = O(n~””). 
|Zn—8|>5 


Furthermore, by (1) and (4) we have 


| (x, — 0)(M(x,) — a) dPrs Kn | (tz, — 0)? dPr = O(n7””). 
|2_—9|>8 


|zp—0|>8 

Combining these results we obtain 

(5) dy = (a1 + 1”)bn + O(n"). 

By (IV) we have e, 2 K,. Using these in (4.6) we have, since p 2 6, 
Baar 2 [1 — 2(a, + n)c/n]b, + Keen {1 + o(1)). 


Applying Lemma 2 we conclude that 


lim nb,, > Krc 4 
ae ~ 2(a, + ne — 1 
Since » is arbitrary Theorem 8 is proved. 

Coro.uary. We have limy..(dn/ba) = am. 

Proof of this follows from (5) and the theorem itself. 

THEOREM 9. Suppose that the Assumptions (I), (JI), (V) and (V1) are satis- 
fied; (VIII) is satisfied for every p with Ky = Ku (p). If a, = c/n with ec > 1/2K, 
then we have for every integer r = 1 

‘ . r/2 r) (0 
(6) limn’’d,’ =; .. 

ne \[o c /(2a,c¢ — 1)]"(2s — 1)(2s — 3)... -3- 





478 K. L. CHUNG 


Consequently the random variable n‘(x, — 6) tends in distribution to the 
norma! distribution with mean 0 and variance o’c’/(2a,c — 1). 

Proor. The proof of Theorem 9 is similar to that of Theorem 6, except that 
certain estimates are obtained in a slightly different way. We need only note 
the following points. 

For every r 2 0,6 > 0, q > O we have 


(7) | ry —60\'dPrs sf | an — 6 |" dPr = O(n"). 
rn 9\>6 |Z_—9|>6 
This follows from Theorem 7. 


For every integer r 2 0 and q > 0, we have 


E{(z, — 0)" '(M (an) — a)| 


9) dPr + of | rn, — 0 ‘AP | 
J \z,—0}>8 


where 7 n(6) and lims.o (6) 0. This follows from (1) and (7). Furthermore 
we have 


+ m)by? + O(n 


E\ (x, 0)" *(M(x,) P qe = O(n”). 


This follows from (1) and Theorem 7. 
Using the Corollary to Theorem 8 we obtain 


(yr Ww r 2 2) (r—2) 1 
bain = [1 rayc(l + o(1))/njb, +(5 )o cb, {1 + o(1)jn 


(ef. (4.16)), from which the theorem follows. 


6. Linear case. In this section we consider the simplest possible M(-), namely, 
a linear function 


M(x) = wr — 0 u #0 


where both » and @ are unknown. Without loss of generality we may suppose 
uw > O and set a = 0. The problem is then to obtain @/u stochastically. This 
case is not covered by either [1] or [2] but is covered by our second case if Y (zx) 
has finite moments to a certain order. Here we treat it with a different method 
under the sole hypothesis that there exists a distribution function F(x) with mean 
0 such that H(y|2) = F(x — M(z)). 

Let the characteristic function of F(x) be f(t); then that of H(y | x) is 


; it(uxr ” F(t). 
In other words, we have the conditional expectation 


(1) Ele “tun r,| as pi tan £4). 





STOCHASTIC APPROXIMATION METHOD 179 


Let the characteristic function of z, be f,(t), n 2 1. Then we have, recalling 
that a4: = ®, — G,y, and using (1), 


fnail(t) = Ele tiny! ™ Ele‘ cate) 


= Ele*"Ele ~tlanVn | rn}] ad Ele ittn itay (prty- ”F( = a,t)] 


= Bie“ ™~* lf(—a,t)e"’"” = f,((1 — pa,)t)f(—a,t)e. 


follows by recursion that 


"ite " \ 
(ft) = exp jh _ IL a — pd) | 
ot ) 


k=l 


n 


Il ff-a — pa,)--- A — war dan f(T (j] — ua,)t), 


k=l keel 
If we now choose 


(2) 


then we have 


II (1 — wa.) = 0 and (1 — wa,) --- (Cl — paras), = : 
kel 


un , 


Therefore we have, for every n 2 1 and every initial f,(-), 


(3) fnar(t) = e”'*[f(—t/pn))". 


Equation (3) determines the distribution of x, ,n 2 2, completely, at least 


in theory. Let & , &,--- be independent random variables with the same dis- 
tribution function F(x) with mean 0. Then the characteristic function of 


—(& + -+> + &)/n 


is precisely [f(—//un)|". Thus the study of (3) is reduced to a classical problem 
in probability theory. We need not go into details here but shall content our- 
selves with mentioning the following facts. 
First, since F(x) has mean 0 it follows from Khintchine’s weak law of large 
numbers (see e.g. [3], p. 253; [5], p. 138) that 
lim f(—t/un)” = 0. 


Therefore by (3) 


lim f,(t) = e'*” 
and consequently z, tends to @ in probability. It is curious that in this simple 
case the stochastic convergence of the procedure is equivalent to Khintchine’s 
theorem. 
Next it follows, by a classical result of P. Lévy ([4], p. 254; see also [5], pp.) 163, 





480 K. L. CHUNG 


that all possible limit distributions of z, — @ are stable laws (including 
the normal) of exponent a: 1 < a S 2. More precisely, if there exists a sequence 
of positive constants {A,} such that (¢; + --- + &,)/A, tends in distribution’ to 
the stable law G(x), then (un/A,)(@ — z,) does the same. In particular, every 
stable law of exponent a, 1 < a S 2, is the limit distribution of z, — 6, for a 
suitable choice of F'(-). 

Finally, it is known (see [5], p. 186) that there exist distribution functions 
F(x) such that A,(z, — 6) does not tend in distribution to a limit, whatever 
the sequence A, may be. 

As a last remark, it is clear from (3) that the proper choice of {a,} must de- 
pend to a certain extent on the unknown parameter uy. In fact, any other choice 
than (2), even a, = c/n with c # 1/y, already greatly complicates the analysis 
given above. It is therefore small wonder that in Sections 4 and 5 the choice 
of fa,} has to depend on the unknown. 


7. Consequences. In this section we sketch briefly some statistical conse- 
quences of the results of Section 5. For brevity we state strong assumptions 
which may obviously be weakened. We put a = 0 without loss of generality. 
Let = {H} bea family of functions of the type denoted by H(-|-) in Section 2. 
Denote by M u(-), 0u, on, Ky etc., the M(-), 0, 0°, K, etc. (if they exist) corre- 
sponding to a given H. 

We assume throughout this section that, for each H e¢ 3, Assumptions (I), 
(V), and (VII) (with subscripts H) are satisfied, and that there are (known) 
positive numbers y and 8 such that aj, 2 y and on < Bfor all H ¢ &. (The con- 
stancy of o7 for fixed H is not necessary; see footnote 4.) We suppose also that 
there is an interval J of positive length such that 3¢ contains the family § con- 
sisting in every function H for which, for some z e J, H(y — y(a — z) | z) is, for 
all x, the normal distribution function with mean 0 and variance f. For simplicity 
we put z, = 0 throughout this section. 

Suppose now that, for a given even integer r 2 2, the conclusion to Theorem 9 
with ¢ = 1/y holds for that r uniformly over all H in 3. It may be important 
for applications to note that it suffices for our argument below that for every 
« > 0 there exists an N(e) such that n”b{(1 — ©) is S the right side of (5.6) 
for alln > N(e) and all H ¢ X. It is clear that usually this would have been 
more difficult to satisfy under the assumptions of Section 4 (especially ITI). 

An examination of the proof of Theorem 9 shows that the validity of the con- 
clusion for a fixed r need not entail Assumption VI for all p. The reader may 
scrutinize our proofs to obtain various conditions on 3 under which the conclu- 
sion stated above holds. Note that it can hold even if the domain of values of 
6» for H ¢ %& is unbounded, despite the unboundedness of bjy over 3 in that 

7It can be shown (see [5], p 175, Translator’s note) that if there exist {A,} and {An} 
such that (&; + --+ + & — An»)/An tends in distribution to a stable law, then we may take 


A’, = 0 for all n. This is because f x dF (x) = 0. 





STOCHASTIC APPROXIMATION METHOD 481 


case. For example, if iC is a family like that of Section 6 (with rth moment finite) 
with y = » = 1/c (uw is known), then b%4 is independent of H (i.e., of 64) for 
n > 1 (see (6.3)). Similar remarks with inequality apply if 3€ consists in all H 
for which M , is linear with known slope and the rth central moment of H is 
bounded over 3. For r = 2 and such an XC (that is, W(H,d) = (@, — d)’ below), 
the result of this section was obtained in the Hodges and Lehmann manuscript 
cited in Section 2. Our M g need not, of course, be linear. 

Let S, be the n-observation statistical procedure defined by using the Robbins- 
Monro scheme with z; = 0 and a, = 1/yn for n observations y;, «++ , y, and 
then estimating 64 by z,4: . Clearly, asn — « 

(1) sup Ey | tay: — On |) = 0 '"(8/y)'[(r — 1)(r — 3) ... -3-1][1 + o(1)). 

HER 

Let H’ eG. Then the random variables u; = 2; — (1/y)yi, i = 1, 2,-+-, are 
independently and identically distributed Gaussian variables (the correlation 
between any two of them is easily computed to be 0) with mean @, , and variance 
8/y. A knowledge of the values taken on by ™, --+ , U, is equivalent to that of 
those taken on by y:, °°: , Yn (recall that 2; = 0), and z, = n™ > sett is a 
sufficient statistic for the family S. Since J has positive length and 


En: \2n — On |" = n""(B/y)'[(r — 1)(r — 3)... 3-1) 
= O(n~”") all H’ eg, 


a simple modification of the argument of Wolfowitz [6] (as applied to the fixed 
sample-size case) shows that, if 3, is the class of all procedures 7’ requiring a total 
of n observations (which are taken sequentially by determining 2, --- , x, in 
any prescribed manner, not necessarily that of S,), and if 57 is the final estimator 
of 6, when the procedure T' is used (thus, 5s, = 2,4; when the 2; are determined 
by the scheme S,,), then asn — ©, 


(2) inf sup E(dp — Oy)" = n™"(B/y)" \(r — 1)(r — 3)... -3-1)[1 — o(1)]. 
Ten H#'eG 


We conclude from (1) and (2) that for the problem of point estimation of 6 
based on n observations which may be taken in any manner, and when the weight 
function (the loss when H is the “true”? member of 3 and we estimate 0, by d) 
is W(H, d) = const.-(0q — d)’, the procedure S,, is asymptotically minimaz in the 
sense that 


sup En W(H, bs,.) 
: Hey os 
(3) un if sap Ba WU. 8.) 
Ten HEN 


Thus, for large n the procedure S, seems to be very satisfactory. 
The above result may easily be strengthened. Assume for simplicity that the 
interval J characterizing G is the whole real line. Let 3 now be the class of all 





{82 K. L. CHUNG 


sequential procedures 7’ which terminate (no longer necessarily at a specified n) 
with probability one under all H in 3%. As before, let 57 be the estimator of 6% 
when 7’ is used, and let Eg(N | T) be the expected number of observations before 
termination of the experiment when H is the “true”? member of 5% and the pro- 
cedure T' is used. Let ¢ > 0 be the cost of taking a single observation y, , what- 
ever be 7’, n, and z, . Let 


(4) r(H, T,c) = cEaA(N | T) + EuW(A, or) 


be the risk function of 7’ when c is the cost of experimentation. Then the results 
of [6] (for the sequential case) and an argument like that of the previous para- 
graph show that, for the setup of the previous paragraph, there is an integral 
valued function »(-) on the positive reals (which is easy to calculate from [6}) 
such that 


inf sup r(H, 7’, c) 
(5) lim “34H 
eo SUP 7(H, Sx), ©) 
Hew 


Alternatively, we may state that if 3% is the subset of 3 for which 


SUP He40 Ev(N\T) Sn, then 


inf sup Ly W(A, or) 

(6) lim "8 4H 
co sup Ey W(H, dbsn) 

Hey 
The dual property to (6), wherein we minimax E,y(N | 7’) subject to 
supy E,W(H, br) S was w — 0, can be stated similarly. 

The results of the two previous paragraphs may easily be extended to weight 

functions other than W(H, d) = | 6, — d\". The results of [6] may be applied 


whenever W(H, d) is a nondecreasing function of | 6, — d | satisfying appropriate 


integrability conditions (see [6]). Thus, one need only verify that 5C satisfies a 
condition like that of footnote 2 above with the inequality replaced by 


E,WWé6(H, La) BE y,W(A, X,) < l—e« 


for some HH’ in G, in order to obtain results like (3), (5) and (6) above. Particular 
choices of W will give results on interval estimation, etc. 

Questions of optimality for the setup of Section 4 remain unanswered at present 
because the technique used in Section 7 is not applicable, linear M(-) being dis- 
allowed in Section 4 and our knowledge of the case a, = c/n being incomplete 
there. We need not, of course, detail the remark that the results of Section 4 
(like those of Sec. 7) may still be used to obtain asymptotic confidence inter 
vals, ete. 

In conclusion, I wish to thank my colleague, Professor J. C. Kiefer, for criticism 
of the MS and for contributing largely to Section 7. 





STOCHASTIC APPROXIMATION METHOD 


REFERENCES 

{1} Hersertr Ropsins anv Surron Monro, “A stochastic approximation method,” Ann. 
Math. Stat., Vol. 22 (1951), pp. 400-407. 

[2] J. Wo.rowrrz, “On the stochastic approximation method of Robbins and Monro,” 
Ann. Math. Stat., Vol. 23 (1952), pp. 457-461. 

[3] H. Cramtér, Mathematical Methods of Statistics, Princeton University Press, 1946. 

[4] P. Livy, Calcul des Probabilités, Gauthier-Villars, Paris, 1925. 

[5] B. V. GNepENKO AND A. N. Kotmoaorov, Limit Distributions for Sums of Independent 
Random Variables; English translation by K. L. Chung, Addison-Wesley Co., 
Cambridge, 1954. 

[6] J. Wo.row1tTz, ‘‘Minimax estimation of the mean of a normal distribution with known 
variance,’’ Ann. Math. Stat., Vol. 21 (1950), pp. 218-230. 

(7| L. Scumerrerer, ‘“‘Bemerkungen zum Verfahren der stochastischen Iteration,’’ Oster 
reich. Ing.-Arch., Vol. 7 (1953), pp. 111-117. 





SOME THEOREMS ON QUADRATIC FORMS APPLIED IN THE STUDY 
OF ANALYSIS OF VARIANCE PROBLEMS, II. EFFECTS OF INEQUALITY 
OF VARIANCE AND OF CORRELATION BETWEEN ERRORS IN THE 
TWO-WAY CLASSIFICATION 


By G. E. P. Box 


Imperial Chemical Industries, Manchester, England 
and North Carolina State College 


1. Summary and Introduction. Theorems already enunciated in a previous 
paper on quadratic forms are used to determine the effects of inequality of 
variance and first order serial correlation of errors in the two-way classification 
on the analysis of variance. It is found chat when the appropriate null hypothesis 
is true, inequality of variance from column to column results in an increased 
chance of exceeding the significance point for the test on homogeneity of column 
means, and a decreased chance for the corresponding test on row means. For 
moderate differences in variance neither effect is large. First order serial correla- 
tion within rows produces a large effect on the “between rows’’ comparisons, 
but little effect on the “between columns” comparisons. 


2. The two-way analysis of variance classification. Consider the analysis of 
variance for a two-way table with k columns and n rows, with one observation 
in each cell. Experiments in which k treatments are tested in n blocks are an 
important source of data classified in this way. In such tables the variance might 
change from treatment to treatment due to the influence of the treatments 
themselves. Changes in variance might also occur from block to block, for in some 
circumstances where experimental material was inhomogeneous in mean from 
block to block it might well be inhomogeneous in variance also. 

A further source of departure from the assumptions usually made in the 
analysis of variance concerns possible lack of independence between the “error” 


components of the observations. In many types of experiments this difficulty is 
met by the introduction of randomisation. Data occur, however, in circum- 


stances where there is no possibility of using this device, usually because the 
factor which is to be studied is the effect of time or position, which itself gives 
rise to the correlation. 

For instance, the first example of analysis of variance of a two-way table in 
R. A. Fisher’s Statistical Methods for Research Workers [1| concerns data quoted 
from Shaw [2] on the frequency of rainfall classified by hour of the day and 
month of the year. As Fisher himself points out, strong serial correlation between 
errors within months occurs because showers of rain which last more than one 
hour are recorded in successive hours. No question of randomisation arises in 


Received 6/15/53 





THEOREMS ON QUADRATIC FORMS 485 


this example. In discussing the analysis of variance table, Fisher remarks that 
the serial correlation between hours within months entirely invalidates the 
“between months” comparisons, but that the ‘between hours’ comparisons 
may still be made (as an approximate test). The truth of the latter part of this 
statement is perhaps not immediately obvious, and it is of interest to make a 
closer study of such examples. 

Other instances of two-way tables in which serial correlation between errors 
might be expected are quoted by Daniels [3] in experiments in wool research 
where, for example, the variation in weight of slubbing coming from adjacent 
positions on the wool card is considered. Daniels recognised that correlation 
effects might invalidate the analysis of variance procedure and carried out some 
theoretical investigation of the problem [4]. He considered the effects of small 
inequalities of variance and small correlations between errors, using an approxi- 
mate method. Tests for the existence of departures from assumptions in the two- 
way table were discussed by Box in 1950 [5], when reference was given to the 
results now published. 

In what follows we retain the assumption of normality, but allow the variance 
to differ from column to column and correlation to occur within rows. By sub- 
stituting columns for rows we can also study the effect of differences in variance 
from row to row and the effect of correlation within columns. 


3. Distribution of items in the analysis of variance table. We need to refer to 
theorems, equations and sections of a previous paper [6] with the same general 
title. We indicate such reference by the addition of a prime to the number of the 
theorem, etc. Thus Theorem 2.1’ and Section 5’ refer to Theorem 2.1 and Section 
5, respectively, of the previous paper. 

Suppose we have a two-way classification of observations with k columns and 
n rows and y,; is the observation in the ¢“ column and i“ row. Then we can per- 
form an analysis of variance corresponding to the entries in the first three columns 
of Table 1. We make the usual assumptions that y,; may be represented by a 
linear model 


(3.1) Ys = at B+ ve + 2K; > Bi = 0; 


i=l 


Alternatively we can denote the model for ali the elements of the /“ column of 
the table (¢ = 1, 2,---, k) by 


(3.2) yx. = al, + 6 + yl, + 2. 


where y;, is the n x 1 vector of entries in the ¢“ column, z;, is the corresponding 
vector of errors, 1, is an n x 1 vector of unit elements, and § an n x 1 vector of 
row constants 8; , Bs, -+- , B,. We shall also need the notation y_,; z,, to denote 
k x 1 vectors of observations and errors in the i™ row of the table. 

We do not make the usual assumption that the z,; have the same variance 
and are uncorrelated. Instead we assume that z.; follows the normal multivariate 





#7) jo yuapuedapul uoTyNquysiq ft 
a7) jo juopuedepul }OU UOTINGUySIC + 
2 JO UWINJOI JO MOL 4.) BY} UL SPUBUTI]I BYy 


JO aNny[Ba @BVIVAE GY} St *@ BIA *} #8 1a} = {nn} JO SpOOI yUO}B] O1OZUOU | y oy} oe Py NX IY OTT “}(T y)y}/"a ie J sUBLIBAOD 
a889 STG} UT “ANVIO UNOS O} UUIN[OO WI} 


@BVIGAE oy} st a puw “y/"a - BOUBLIBA @BBIDAB oy} 8t.2 = sasvo YO UT *}29(,_¥ — *19) | 


QOUBLIBA ul seIUdle Ip A[UO uey um ‘ { By ‘o19aZ O18 SUOIPRLPLICO oqy uey MM SOu0 IMO] eq} ‘asno [vsoues ey} ul preg suorsseidxe raddn eLL s 


T=? tom 


(40 iif) 


; 
") - - z= *0 penpt 
« 4 


l=? 
i) “= 
< suwnjog 
¥ 


O JO WORNQLYSIP T]I[N, OD Jo wonejzoedry, * ‘sazenbs 


1QD) fivm-om v sof a2uDisDAa 
14 


SMOL UTYPIM 84104 fi ) 9 pun pvr baun s FIUDLADA UUENJOI YEN , 


I WIavL 





THEOREMS ON QUADRATIC FORMS {87 


law with variance-covariance matrix &(z.Z.;) = v= {v,,}. We further assume 
that z.;(j = 1,2,---,i—1,i+ 1, --- ,n) follows the same law independently 
of z.;. Thus vy, +++ , vee, °*> , Vee are the k variances and vy , 13, «++ , Va," ; 
Ve-1e the 4k(k — 1) covariances, the same for every row. This enables us to 
study the effects of column to column heterogeneity of variance and/or ‘within 
rows”’ correlation of errors. The expected values and null distributions of the 
sums of squares, when the observations are so represented, are shown in Table 1. 
They are derived below. 

Let Y,, be an n x 1 vector of elements Yn, Ye, -:- , Ye, obtained from y, 
by orthogonal transformation Y;, = py;. , and let the n x n orthogonal matrix p 
have all the elements of its last row equal ton”, thus ensuring that Y,, = ‘9, 
Then 


(3.3) Y;. = py. = oc} + B+ 7 + Z, 


where 6 = pl, , B = pS and Z,. = pz... 

Due to the nature of p, in the vector 6 the last element is n'’* and the remain- 
ing elements are zeros, and in B the last element is zero, since )-!' 8; = (). The 
transformed columns of the original two-way table and the transformed column 
of row means may now be written out as follows: 


Row Means 


By, + Zu rs B, + Zu ae By + Zia B, + 2. 


Bi + Zi aes Bit Zi de Bi t+ Zei B, + Z., 


Baa + Zin-1 phy B, -1 + Zin ae B,, + Zen B,, i + Z ~ 
natty) + Zins MatytZn es WM (aty)t+ Zin a+ Z. 


Now consider the nk x 1 partitioned vector z and the nk x nk partitioned 
matrices P and V defined by 


/ 


Zz 
oo Viel, 


* + Dap I, 


* + Une I, 
where I, is the n x n unit matrix and 0, is the n x n null matrix. For Z, denoting 


the fector Pz of transformed variables, the matrix of variances and covariances 
is 


(3.5) &(ZZ’) = &(Pzz’P’) = Pé&(zz’)P’ = P’VP = V. 





188 G. E. P. BOX 


Since we are concerned with normal variates, it follows that the Z,; are distributed 
in precisely the same manner as are the z,; , that is the vector Z_; of transformed 
errors in the 7“ row follows the normal multivariate law, with variance-covari- 
ance matrix v, independently of the errors in the other rows. 

Between columns sum of squares. 


k k 
(3.6) Qc=n >» Gi. - 9%.) = zs (n° vy, + Zin — Zn)” 
t== 1 t=1 
and the matrix of the quadratic form g, = > tnd (Zin — Z.n) i8 
(3.7) m = I, — k'14,1, 


while the variance-covariance matrix for the vector of errors Z.,, is v. We have 
therefore 


(3.8) u {ue} = VM = {v4——_ — B;.} 


where u,, is the element of the ¢ row and s“ column of the matrix u and @,, is 
the arithmetic mean of the entries in the ¢* row (or column) of v. It follows 
from equation (2.5’) that the expectation and null distribution of Q¢ are those 
shown in Table 1 where the ’s are the latent roots of u. 

Residual (error) sum of squares. 


(3.9) Qe = >> (yi — hh. — G4 + 9.) 
tel 


a-i - 


n 1 I 
= L(Y —- VV.’ = DD Zu — 7Z.)’. 


i=] t=1 tl 


Denote > tat (Zu — Z.) by qi (¢ = 1, 2,---,m). Then gq; follows the same 
distribution as g;(j = 1, 2, --- , n) independently of q; . In particular it follows 
the same distribution as q, discussed above. Also, >i q; = Qe is distributed 
independently of Qc , in the form indicated in Table 1. 

Between rows sum of squares. 


(3.10) Qe=kL Gi- 9.) =k 
i=l 
Remembering that 5°71 Bi = 307-1 Bi we have 


(3.11) &(Qze) 


where 0, is the average variance : vu/k and dy is the average covariance 
> ¥ vn/{k(k — 1)}. Now Z., is distributed normally and independently of 
totes 
Z.,(i #j = 1, 2,--+,n). Hence, when the null hypothesis that Dy Bi = Ois 
true, Q» is distributed like Xp = {in + (k — 1)re}x'(n — 1). 
Since Z.,, is distributed independently of Z.; (¢ = 1, ---,n — 1), Qe and Qe 





THEOREMS ON QUADRATIC FORMS 189 


are distributed independently. Usually Q» will not be distributed independently 
of Qs , however, as will now be shown. 

Dependence of Qe and Q,z . To investigate the dependence of Qe and Qs we 
transform the k x 1 vector Z.,; of the transformed variates in the i“ row of the 
two-way table to the vector W.; by means of the orthogonal transformation 
W.; = RZ, , where the elements of the last (k) row of R = {r,,} are all equal 
to k'”* so that W,,; = k'”Z.; . The variance-covariance matrix for the new variates 
is now given by 


(3.12) 6(W..W.;) = &(RZ.Z{R’) = R&(Z..Z.,)R’ = RvR’ 


and therefore &(W.xWi,) = k'” Does d.Tr. Now > fo = O for 
t = 1, 2,---k — 1. The covariances between W,,; and Wy;, Wa, +--+ , We 
cannot therefore all be zero unless d,,, the mean of entries in the s** row or 
column of v, is constant for all s, since in a k-space only one vector can be simul- 
taneously at right angles to k — 1 other linearly independent vectors. In par- 
ticular the condition that d,, is constant for all s is satisfied when the observa- 
tions are independent and the variances are equal (when v = oI;) and also 
when the observations are circularly correlated. This condition usually will 
not be satisfied, however. In particular it will not be satisfied when the observa- 
tions are independent but the variances are unequal, or when the variances are 
equal and the observations are serially but not circularly correlated. 

If W,, is not distributed independently of W,; (¢ = 1, 2,---,k — 1), then 
Wi; will not be distributed independently of > tak Wi, and Qe = >in Wik 
will not be distributed independently of Qz = aoe twa. 


i=l 


4. Distribution of test criteria. 


Between columns test. When the appropriate null-hypothesis is true, the ratio 
of mean squares (n — 1)Q-/Qz is distributed like 
( 


(4.1) X¢/Xz = 15 x'()} 1D Ayx'(n — »}, 


) \ j=l 


/ 


where the )\’s, which are the same for both numerator and denominator, are the 
k — 1 nonzero latent roots of the matrix u = {v,, — d,,}, and the numerator and 
denominator are distributed independently. We may use the exact series of 
Theorem 4.1’ to find the value of Pr (Qc/Qz > Yo) and so provide a check on 
the F approximation, provided we choose examples in which n is odd so that 
n — 1 is even. 

To use the approximation of Theorem 6.1’ we require the first two cumulants 
of Qc and Qs when the null hypothesis is true. Using equations (2.5’) and (2.6’) 
we have 


(4.2) K,(Qe) ore k (bre ~~ J ) = (k = Li, = Vis) 
. k 


K(Qc) = 2 = > (v4 — 0, (ve, = D,.) 


tel a=l 





G. E. P. BOX 


k 


k x 
2<>> dv. — 2k DW. +k’? > 
=1 ) 
) 


~— 


tel « tel 


h ; 


200 D (vu — bs. — &. + 6.)’ 

tel a=] 
where i, = Lh 1Vu/k, while 6, = > ent 2 oat v,/k° and } = 2 ant tes k. 
Now K(Qz) = (n — 1)K,(Q.) and K2(Qg) = (n — 1)K2(Q-). Hence the null 
distribution of the ratio of mean squares (n — 1) Qc/Qs is approximately that of 
Fi(k — 1)e, (k — 1)(n — 1)e} where 


k I 
(4.4) =k (6, — 6.)°/(k — I< > D> vi. — 2k D0. — ks?) 
tal 


t s=l 


We notice that the comparison of column and residual mean squares is without 
bias, whatever the nature of the matrix v. The discrepancy that arises is repre- 
sented in the approximation as a reduction by the same fraction e of both degrees 
of freedom in the F ratio. 

Between rows test. For testing row means the appropriate ratio of mean squares 
is (k — 1)Qr/Qe . As we have seen, Qp and Qs are not distributed independently 
and the comparison is biassed unless the average covariance #;, is zero. 

To obtain under the null hypothesis the exact probability 


Pp = Pr {QOr/Qe > ba} 


where (k — 1)da = Fafn — 1, (n — 1)(k — 1)} is the a probability point of the 
F distribution with n — 1 and (n — 1)(k — 1) degrees of freedom, we rewrite 
the probability in the form Pr {(Qe — Qs) > 0} and employ Theorem 4.3’ as 
explained in Section 5’. 

Let Z be a k(n 1) x 1 vector of the Z,; arranged in the order Z,, , --+ , Za ; 
Zin, +++, Zee |]ruen—r), *** » Zicn-ty . Let V be the variance-covariance matrix 
for the Z,; arranged in this order; thus V = &(ZZ’). Then under the null hypothesis 
Tae 8; = > int B; = 0, the quadratic forms Qr and Q» are each functions 
of Z, 


nl 


(4.5) Qe =k > 2 = 2'MeZ 
a | 


n—1 sk 


(4.6) Qe = >, > (Z,; — Z.,)? = 2’ MeZ. 


t=] tol 
We require the probability that Z'MZ exceeds zero, where M = (Mz — oM.:). 
Now My, isa k(n — 1) x k(n — 1) matrix partitioned after every k* row and 


column, with each of its n | diagonal positions occupied by a k x k matrix 
My, = k 7. and zeros elsewhere: 





THEOREMS ON QUADRATIC FORMS 


mz! O {---} 0, 


0, omg 0, 


hie 0, et 


Also, Mx and V, and hence V(M, — @M,), are of this same form with the k x k 
matrices in the diagonal positions equal respectively tomy = I, — k ‘1,1, , to 
v, and to v(m, — omg). Hence the (n — 1)k roots of the determinental equation 


(4.8) | V(Mx — Mx) — Alkw—1) | = 0 
are the k roots of the equation 
(4.9) Ay = | v(mz — omg) — A | = | fd. — O(Vie — Tr.) — AS} 


each repeated n — 1 times where 6,, is the Kronecker delta. Thus 
r’ A 

(4.10) Pr {Qe/Qe > } = Pri dD rx'(n—-1) + D Ajx*(n-— 1) > 0}, 
i=l jeor’+1 


where \,; and \; are respectively positive and negative roots of equation (4.9). 
No serious lack of generality in conclusions will be introduced if, in the examples 
we consider, we make the number of rows n odd so that n — 1 is even. Then 
we can apply Theorem 4.3’ and the required probability is 


n—1)/2 


(4.11) Pr {Qe/Qe>o} =D D ae, 
i=l gun] 


where the a’s are obtained from equations (2.24’) (2.25’), and (2.26’). 

The theory above may be used to study the distributions of the test criteria 
for any matrix v. We use it here to consider the effect upon the significance 
test when 

(i) the errors are independent but inequality of variance from column to 
column occurs, 


(ii) the errors have equal variance but are serially correlated within rows. 


5. Effect of inequality of column variances in two-way table. If we assume 
that the variance-covariance matrix v of “errors within rows’”’ is diagonal, with 
elements 0); — 01, 02 = O3,°°° , Vix = o% , We have the case in which the vari 
ance changes from column to column but the errors are distributed independently. 

Between columns test. The matrix u of equation 3.8 reduces to 


(5.1) u = {(8,,'— k’)o4}. 


Taking n — 1 even we can obtain the exact distribution of Q./Q, under the 
null hypothesis, using Theorem 4.1’. 





492 G. E. P. BOX 


On simplification of equation (4.4) we find that (n — 1)Q./Qz is distributed 
approximately as F{(k — 1l)e, (k — 1)(n — 1)e} where 


e= (1+ e(k — 2)/(k — 1} 


and ¢ is the coefficient of variation of the variances, given in equation (8.2’). The 
valculated values in Table 2 indicate that, as would be expected, the divergencies 
are similar to those for equal groups with the one-way classification. 

Between rows test. Since the covariances v,, are all zero, the comparison of row 
and error mean squares is not biassed. However, the row and column mean 
squares are not distributed independently. After substituting o; for v;, and zero 
for vy,(t * s) in A, of equation (4.9), the resulting determinant may be simpli- 
fied still further. 

Here and in what follows we shall refer to the columns of a determinant, 
counting from left to right, as c;, c@,+-- , ete., and the rows, counting from top 
to bottom, as 7, , 7%, °°: , ete. By adding c , c;, --- , ce toc , then subtracting 
r, x o;/o; from r; (j = 2, 3, ---, k), and finally dividing each row by k and 
changing signs in the last kK — 1 rows, we find that the required k values of \ 
are the solutions of the determinantal equation 


= 1)*-"a, 
oi(1+ ¢)/k +--+ oi(1 + $)/k 
0 - 0 


9 


oo; +d vee 0 


0 0 | gkko 


In the one-way classification it appeared that for a given range of variances the 
greatest discrepancies might be expected when k — 1 of the variances were equal, 
while the k* was larger (say a times as large as the others). Suppose the vari- 
ances are 1,1, --- , 1, a. Then (5.2) reduces to 


(5.3) (> +r)" [kn? — {(k — 1)(1 — ba) + (a — $)} — kag] = 0 


from which all the \’s are readily obtained. 

The results of a number of calculations using methods described above are 
shown in Table 2. It appears that the discrepancies in probability both for the 
test on rows as well as for the test on columns are not very large. 

As was the case for the one-way classification, the effect of column-to-column 
differences in variance is to cause the significance of column differences in mean 
to be overestimated, although the differences in variance would have to be 
large for the effect to become serious. In the row comparisons, discrepancies of 
similar order but in the opposite direction occur, leading to underestimation of 
significance. Comparison of the first and third lines with the second and fourth 





THEOREMS ON QUADRATIC FORMS 


TABLE 2 
Probabilities of exceeding 5% point when column variances are unequal 
in the two-way analysis of variance table 


True Chance (per Values in approximating 
cent) of Exceeding distribution F (h’, h) of 


Number N , sor ; 
_—" Number 5% point ratio of mean squares* 


of of Column 
tows Columns Variances 
n k Row Columns 
Test, Test, 
Exact Approx. 
4 4. 25 49 oC ». 46 (20) 
4.27 >. 59 i ; .38 (8 ) 
3.76 .93 Pil .24(20) 
3.91 >. 12F W2C ».90(8 ) 
t.47 ).92T , ).43 (8) 
4 


5 
3 
3 
3 
3 
3 4.86 .O9 : 15.79(20) 


l 
l 
| 
I 
l 
l 


* Bracketted values show appropriate degrees of freedom when variances 
are equal. 

+ 5.98 by the exact method. 

{ 6.75 by the exact method. 


lines in Table 2 shows that the effects are worst when all the variances but one 
are at the lower end of the range. Comparison of the last four lines in the table 
suggests that the between-rows discrepancy is worse when the number of rows 
exceeds the number of columns, while the between-columns discrepancy is 
worse when the number of columns exceeds the number of rows. 

6. Effect of serial correlation of errors within rows. Suppose that the normally 
distributed errors 21; , Zi, °** , 2¢ in the 7 row of the analysis of variance 
table all have equal variance o° but are not distributed independently. Thus 
v = o 9, where 9 = {py} is ak x k matrix with diagonal elements all unity and 
the element p,, of the ¢ column and s“ row is the coefficient of correlation 
between z,; and z,,; , the same for all 7. The theory described above enables us to 
examine the effect of any such correlation we choose. 

A type of correlation of particular interest in practice is serial correlation 
which might be expected to arise when the observations within rows or columns 
were made at equally spaced intervals of time or space. This occurs when the 
rows of the two-way table are associated with a time factor, as in Fisher’s ex- 
ample [1] and in the growth and wear curve examples of [5], or with a space 
factor as in Daniel’s examples [3]. 

Normally the first order coefficient p; , or p as we shall denote it, will be the 
largest of the serial correlations. We shall study the case where this first order 
serial correlation is taken into account but the effect of other correlations is 
ignored. Thus we shall assume 





0 0 
0 0 
‘To ensure positive definiteness we also assume 
(6.2) |p| < [2cos {x/(k + 1)}]. 


“Between columns’’ test. In order to determine the exact probability 


Pr {Qc/Qe > }, we require the latent roots of u of equation (3.8). Making the 
substitution v = oo, where @ is defined in equation (6.1), and writing \ = \’o° 
the determinantal equation multiplied by k/o’ is 


| kpm — ky'I, | 
k — (1+ p) — kW’ kp — (1 + p) 
kp— (1429) k— (1+ 29) — kr’ 
—(1 + 2p) kp — (1 + 2p) 
—(1 + 2p) —(1 + 2p) 
—(1 + p) —(1 + p) 

—(1 + p) nea —(1 + p) 
kp — (1 +2p) vee —(1 + 2p) 
k — (1+ 2p) — kX’ --- —(1 + 2p) 


—(1 + 2p) se “hee Oke 


—(1 + p) sos bk — (1 + p) — kn’ 


To solve the equation, the determinant is first reduced to a more tractable 
form by a series of elementary transformations as follows: 

(i) Add ce, + cg + ++: +o tog. 

(ii) Divide c; by —kX’, and add (2p + 1) x ¢; tom, cs, --- 

(iii) Substitute \’ = 1 + pé and divide ce , --+ , c, by kp. 

(iv) Add cg to cz, C4 tO C3, -** 5 Ce tOCKy. 

(v) Add r, to re, m1 + 2 tO Ts, °°** 1 + Tees + re to % , and multiply 
% by k. 

(vi) Add (& — 2)cy + 2c. + 3c, +--+ + (k — 1)cy_, to ce , change the sign of 


c,. , and interchange c. and ¢;, . 


, Ce, in turn, 





THEOREMS ON QUADRATIC FORMS 495, 


(vii) Add a new first row ro = 10100---0 and a new first column 
co = 100 --- 0, which leaves the value of the determinant uncharged. 

(viii) Add ro to r; , m2, «++ , T in turn, and interchange rows and columns. 

We now have the equation in the more manageable form 


1 l Il 
0 
] 
0 


0 0 
0 0 


the nonzero solutions of which give the required )’s via the relation 
A, = (1 + pd,)o’. Denote the matrix of the determinant of (6.4) by L and a 
column vector of real numbers (x, 2, :-- , %) by x. Then the necessary and 
sufficient condition that a nontrivial solution exists for the equations Lx = 0 
is that | L| = 0. Thus corresponding to each value of # satisfying (6.4) there 
exists a set of solutions 2, 2%1,°** , Xe. 

The equations Lx = 0 may be written as 


A k 
(6.5) ~x2=0, Law, =, 


t==0 tan() 
(6.6) Le — OX + Lee = O,7 t=0,1,°---,k— 2. 
The difference equation (6.6) with boundary conditions given by (6.5) is readily 
solved by standard methods. With # = 2 cos ¢, a set of solutions 


» it i(k—t 
(6.7) = "or 


is obtained if @¢ = 2sr/k + 1 or if (k + 2) sin (kp) = k sin $(k + 2)¢. In the 


first case, 


( 


De ll,-+-, $k; k even, 
(6.8) v= 2 00s ( — ), 8 = 4 
k+1 \1,-++, 4k — 1); k odd. 


In the second case, the remaining solutions for # are most readily obtained by 
putting ¢ = tan 4¢, yielding 


( ki2 . 9 <a 
1D (—1)"(k — 28 — 1) (Sa " (’)*" =0 k even, 


a= 1 — l 


k—-1)/2 ; cates tn ae 
om (—1)*(k — 2s — 1) (' + *) zr” Uwe k odd. 


ee 
a= as 


(6.9) 





G. E. P. BOX 


TABLE 3 
Values of 8 for first order serial correlation 


4 5 6 7 8 10 


1.0000 0.0000 0.6180 1.0000 1.2470 1.4142 1.5321 1.6180 1.6825 
1.3333 0.5000 0.1165 0.5486 0.8544 1.0760 2405 1.3685 

1.6180 1.0000 0.4450 0.0000 0.3473 0.6180 0.8308 

1.7165 1.2153 _—0.7258 —0.3057 0.0407 0.3229 

1.8019 —1.4142 —1.0000 —0.6180 —0.2846 

—1.8429 —1.5203 —1.1588 —0.8113 

—1.8794 |—1.6180 |—1.3097 

1.9001 |—1.6772 

1.9190 


TABLE 4 


Between columns test: Values of « 


k = 3 C ‘ = 10 


-0.4 0.9576 . 8862 0.8233 
—Q.2 0.9863 . 9640 0.9453 
0.2 0.9769 . 9507 0.9222 


+0.4 1). 8832 . 8033 0.7718 


Since #8 = 2(1 — *)/(1 + #) is a single valued function of ¢’, the polynomial 
equations (6.9) in @ supply the 4k — 1 and 4(k — 1) values of 3 required when 
k is even and odd, respectively, to give with (6.8) the total k — 1 solutions. 

Values for k = 2, 3, --- , 10 are shown in Table 3, whence values of the \’s 
may be obtained for any chosen values of p and o° from the relation 
A, = (1 + pd,)o’. Using these values we may obtain the required probabilities 
from the exact series of Theorem 4.1’. 

If we use the F approximation we consider that the ratio of mean squares 
(n — 1)Qc/Qz is distributed approximately as F{(k — l)e, (n — 1)(k — 1)e}, 
where 


(6.10) e = {1 + p2(k + 1)(k — 2)’/(k — 1)(k — 2p)?}™. 


Values of the constant ¢ for various values of k and p are shown in Table 4. 
Since there is no bias and the effect of moderate correlation does not greatly 
reduce the degrees of freedom in the approximation, no large discrepancies will 
be expected in the between-columns comparison of the analysis of variance. 
Some calculated values are given in Table 6. 

“Between rows” test. As we have seen already, the expectations of the row 
and error mean squares are equal only if the average covariance od, = 0. For 





THEOREMS ON QUADRATIC FORMS 


TABLE 5 


Between rows test: values of bias B 
k=3 C 5 k = 10 
0.3684 ).3103 0.2593 
0.6471 .§296 0.6154 


1.4615 .4348 1.4167 
2.0909 .9524 1.8696 


TABLE 6 
Probability (per cent) of exceeding 5 per cent point when first order serial correlation 
between errors within rows 1s p, for analysis of variance 
table with 5 rows and 5 columns 


First order serial correlation, p —-0.4  —0.2 0.0 
Exact per cent probability for test on 
rows 0.03 1.01 5.00 13.05 | 24.70 


Approximate per cent probability for 
test on columns 5.90 | 5.! 5.00 5.37 | *6.68 


* By exact method, per cent probability is 6.43. 
the case of first order serial correlation considered above, the expectations, 
under the null hypothesis, of the row and error mean squares are, respectively, 
(1 + 2p(k — 1)/k)o? and (1 — 2p/k)o’. 
The ratio B of these expectations is 
(6.11) B = 1 + 2pk/(k — 2p). 


Values of B for a number of values of k and p are shown in Table 5. This bias 
coefficient can be large even with only moderate correlation, and we shall there- 


fore expect discrepancies to arise in the between-rows comparisons. Using 
equations (4.9), (4.10), and (4.11), exact probabilities for the between-rows test 
are obtained. 


The results of a number of calculations for the case of the two-way table with 
five rows and columns are shown in Table 6. These confirm that very large 
discrepancies in the between-rows test in the directions expected do in fact 
occur, but that the between-columns comparisons are much less seriously affected. 
In particular, the remarks of R. A. Fisher concerning the analysis of the rainfall 
data are seen to be justified. 

Acknowledgement. I am indebted to Mrs. Margaret Edmondson for valuable 
assistance with the calculations. 





G. E. P. BOX 


REFERENCES 


R. A. Fisuer, Statistical Methods for Research Workers, 8th ed., Oliver and Boyd, Edin- 
burgh, 1941, p. 226 
N. Suaw, The Air and its Ways, Cambridge University Press, 1922, p. 211 


2 
< 
2 
) 


3} H. E. Dantes ‘Some problems of statistical interest in wool research,’ J. Roy. Stat 
Soc., Suppl., Vol. 5 (1938), pp. 89-128 

H. KE. Danrens, ‘‘The effect of departures from ideal conditions other than non-nor- 
mality on the ¢t and z tests of significance,’’ Proc. Cambridge Philos. Soc., Vol. 
34 (1938), pp. 321-328. 

G.E. P. Box, ‘Problems in the analysis of growth and wear curves,’’ Biometrics, Vol. 6 
(1950), pp. 362-389. 

G. E. P. Box, “Some theorems on quadratic forms applied in the study of analysis of 
variance problems, I. Effect of inequality of variance in the one-way classifi- 
cation,’’ Ann. Math. Stat., Vol. 25 (1954), pp. 290-302. 











A DISTRIBUTION-FREE TEST FOR REGRESSION PARAMETERS 


By H. E. Danre.s 
Cambridge University, England, and University of Chicago! 


1. Introduction and Summary. Brown and Mood [1], [4] have recently given 
convenient distribution-free methods of testing and setting up confidence 
regions for the parameters of a linear regression model. Their technique, which 
is based on the use of medians, allows the parameters to be considered singly 
or simultaneously as required. Theil [5] gives two methods of constructing 
confidence intervals for single parameters, a “‘complete’’ method using rank 
correlation which is valid under the conditions assumed by Brown and Mood, 
and an “incomplete” method valid under wider conditions but not making 
full use of the data. For several parameters simultaneously, he obtains con- 
fidence regions in the weak sense of covering the true parameter point with 
probability not less than an assigned value. 

in the present paper we give a new distribution-free test for the hypothesis 
that all regression parameters have specified values, assuming only that the 
residuals are independent and have probability 4 of being positive or negative. 
It can be used to set up exact confidence regions for the true parameter point. 
The new test avoids a defect which is shown to appear in the corresponding 
Brown and Mood test when the sample is not large. The distribution of the test 
statistic is found explicitly only for the case of two parameters, though in prin- 


ciple the idea extends to any number of parameters. The presence of repeated 
values of the independent variable necessitates certain modifications in the test, 
and a method of computing the appropriate distributions in such cases is de- 
scribed. 


2. The m test. Suppose we have n pairs of observations (2; , y;), 7 = 1, 2, 
- ,n, from a bivariate population such that 


(2.1) Y¥i = act Bx;t+ « 


where a, 8 are unknown parameters and the e,’s are independently distributed 
errors such that Pr(e; > 0) = Pr(e, < 0) = 4 for all 7. The z,’s are assumed to 
have assigned values, supposed for the present all to be different and arranged 
in increasing order. By the usual argument the test we obtain will still be valid 
if the x,’s have any joint distribution provided that for every set of x,’s the 
conditional distribution of the e,’s satisfies the above conditions. We wish to 
test the hypothesis that a = a ,and 8B = By. 
Rewrite (2.1) in the form 


(2.2) = (—27,)8 + yi — «&. 

Received 9/9/53. 

1 Research carried out at the Statistical Research Center, University of Chicago, under 
sponsorship of the Statistics Branch, Office of Naval Research 


199 





500 H. E. DANIELS 


In the (8, a) plane (2.2) defines n straight lines with successively decreasing 
gradients —2;, —%2.,-°°*, —2,. On the null hypothesis all these lines must 
pass through the point (8, ao). The ¢,’s are, however, not known and we con- 
sider instead the n lines 


(2.3) a= (—2,)8 + y; 


' 


which are parallel to the corresponding lines of (2.2). In general they are not 
concurrent but partition the plane into 4(n° + n+ 2) polygonal regions of which 
2n are open regions extending to infinity and the remaining 4(n — 1)(n — 2) 
form a set of contiguous closed regions (see Sec. 7). Each line passes above or 
below the point (89 , a) according as the corresponding ¢; is positive or negative; 
under the null hypothesis either event is equally likely. 

So, speaking crudely, one expects that for typical samples the point (6 , ao) 
will be situated somewhere near the middle of the set of closed regions rather 
than in or near one of the open regions. This idea motivates the following test 
procedure. Assign a score to each region equal to the minimum number m of 
lines which have to be crossed to escape from it into one of the open regions. 
Reject the hypothesis a ay, 8 = Bo if the score m for the region containing 
(Bo , a) is significantly low. We shall refer to this test as the m test. 


3. Characterization of regions. Let s; = sgn e, = sgn (y; — a — Bx;). We call the 
ordered array of signs s;, 8%, --* , 8 the signature of the sample under the hy- 


of 5 





REGRESSION PARAMETERS 501 


pothesis (8, a). Each of the polygonal regions in the plane is characterized by 
the signature which the sample would have if the point (8, a) lay within it. We 
call this the signature of the region. 

All 2” possible combinations of signs are equally likely, though only 
k(n” + n + 2) of these appear as signatures of regions for a particular sample. 
But the 2n open regions must each bear a characteristic signature whatever 
the sample chosen, since a point (8, a) can always be found sufficiently far out 
in an open region for the signs of the corresponding e¢,’s to be unaffected by any 
given parallel displacements of the lines. In particular there is one open region 
lying between lines 1 and n for which the ¢,’s are all positive, and a conjugate 
open region on the other side of the figure lying between the same two lines 
for which the e,’s are all negative. Starting from each of these regions we can 
perform a clockwise tour of the open regions, changing the appropriate sign 
whenever a line is crossed. In this way we obtain the following m pairs of con- 
jugate signatures characterizing the open regions, numbered according to the 
last line crossed: 


op eos $e: 
ee ie aes ee ssi ciate 
vi > 3 > >} ees 
. a rg 7 ey «5° 
Heer iate Res PP Peni —, 
n +62 ae aay PEGs + to + 


The signatures of the closed regions can be filled in similarly. Fig. 1 shows a 
particular set of signatures for n = 5. As each successive line is crossed in es 
caping to an open region from the point (8, ao), the signature of the region 
traversed changes by one sign at a time. The score m for the given sample must 
therefore be the minimum number of sign disagreements between the sample 
signature under (8) , a) and any of the 2n signatures (3.1). 


4. Distribution of m. We now derive the distribution of sample scores. Let 


t; , t; be the numbers of sign disagreements with the ith pair of conjugate signa 
tures in (3.1). Obviously ¢ = n — t; and the sample score is m 


min; min (t;,n — t,;). This cannot exceed 4n, and in fact is proved later not to 
exceed [}(n — 1)]. We require P,,(mo) = Pr(m S mo) where 


(4.1) 1 — P,(mo) = Pr(m > mo) Pr(m < ti <n — mo, all 7). 


The method of arriving at the distribution is most easily understood from : 


example. Suppose n = 11 and the sample signature is + + — + — — ++ 4 
+. Comparing it in turn with the signatures (3.1) we get the values of / 





502 H. E. DANIELS 


shown in the following table 


7 


9 
t= + 
: , 
7 454 


Ba 0 3 3. 


The score is m = 4. The last row of the table gives the values of the cumulative 
suM W; = 8 + 8 + --+ + 8;, where 8; is interpreted as +1. 

In the general case it is enna that if ¢( =t,) is the total number of negative 
signs in the sample signature, ¢; = ¢ + w,;. In particular, t, = n —t = t+ wy. 
Since m <t, <n — mis nh to m <t <n — mo, (4.1) may be re- 
written in the form 

1 


Pr(m > m) = = pn (t, mo) 


t==mo+ 1 


(4.2) 


Pr(t, mo) = Prim <t+w<n—m, i= 


Now p(t, mo) is just the probability that, starting at the point ¢ and proceeding 
by independent equally likely random steps of +1, one arrives after n steps 
at the point n — t, having avoided absorption on boundaries at the points 
my, nN — mo. Solving the random walk problem in the usual manner [3] we 
find 


; le | n n | 
(43) Pall, oo 2” zA(, + j(n a 7 - * + j(n = ea} 


Hence, after some reduction, 


(4.4) eal (n os ee > ke w ), 


(n — m) + j(n — 2m) 


the series terminating at 7 = |mo/(n — 2mg)|. 
Confidence regions for (8, a) with confidence coefficients 1 — P,(mo) are 


provided by the polygons made up of all regions for which m > mp). In par- 
ticular P,(0) = n/2”” is the chance that (8, a) lies in an open region, as is 
otherwise evident from the fact that the open regions account for 2n signatures 
out of the 2" equally likely possibilities. So the largest closed polygon formed 


. ‘ . . . — 9 1 
by the lines is a confidence region for (8, «) with coefficient’ 1 — n/2” 


a hl . ‘ 3 . / . 

The maximum possible value’ of m is [}(n — 1)]. For when n is odd, s 

? This result was given in [2] for a particular case of the model (7.1) below. 

iL. J. Savage points out that this is an immediate consequence of the fact that any 
transversal parallel to one of the lines crosses the n — 1 remaining lines. 





REGRESSION PARAMETERS 503 


n = 2k + 1, we have 


, 1 2% +1 1 W/2ke+1 
Pacsalk) = oa 2 (. +j+ :) ae BG) =} 


jn 0 


so that m S k. On the other hand when n is even, say n = 2k, 


2k E (/2k —1 2k — 1 )\ 
path — V) = ga 2, (. + 2j + ) , miki * 3) +(. +2 +1 


2k — 1 1 “(2k -1 
Gas) al j )=1, 


so that m S k — 1. Hence m S [4(n — 1)] in all cases. Equality is, however, not 
necessarily attained in every sample. In the example of Fig. 1,n = 5 but m = 0 
or 1 only. By moving line 4 parallel to itself until it passes beyond the in- 
tersection of lines 3 and 5, a region is formed for which m = 2. 

Values of P,,(mo) for n ranging from 3 to 30 are given in Table I. When 7 is 


TABLE I 


P,(mo) = Pr(m S mo), Cf. Equation (4.1) 


1.0 
1.0 
984 
. 875 
-703 
527 


376 999 

258 969 

171 873 1.0 

111 733 1.0 

071 583 909 1.0 
044 444 992 1.0 
027 327 j 944 1.0 
O16. ‘ 52% 850 =. 996 
O01. 725 .964 


006 =. . of -591 =, 887 
003. e 658 .776 
002. ° 356 =. 651 
OO1 . 03% . -265 .526 
001 ° . 193 =. 413 

-137—. 315 
-096 .235 
-066 172 
045 .124 
-030 .087 
020 061 





504 H. E. DANIELS 


large (4.4) approximates to 


«© 1 al 
(4.5) Pn(™) ~ 42 >. —= exp[—4(2j7 + 1)°zol 
y=O V/ 24 
where z = (n — 2mo)/+/n. The 5 per cent and 1 per cent values of z are 
3.023 and 3.562, respectively. 


5. Comparison with the Brown and Mood test. It is of interest to compare 
the m test with the corresponding Brown and Mood test ([{1], p. 407) which is 
valid under the same assumptions. Fer convenience we suppose n to be even. 
Brown and Mood separate the observations into two groups of 4n according 
as the z,’s lie below or above the median. Putting a = a), and 8 = B), they 
count the numbers r; , rz of positive e,’s in the first and second groups, respec- 
tively, and reject the hypothesis when 


(5.1) A = (8/n){(r, — n/4)° + (re — n/4)"} 
is significantly large. For moderately large n, A is approximately a x’ variable 
with 2 degrees of freedom. 

The greatest possible value of A is n, which it attains only if r; and rz are 
each either 0 or 4n. From the viewpoint of the present paper A can therefore 
be regarded as measuring the closeness of agreement of the sample signature 
with any of the four signatures, 


kn in 


r 7 


(52) +++ -- ++ +4++ -°> +4 --- 
ma ma tHe HH FH+H 0+ $4 


Since the minimum number of sign discrepancies is }n — |r — 


|r. — n/4| an alternative statistic more in the spirit of the m test is 


B = (2/+/n)(|r1 — n/4| 4 
which for moderately large n is such that 
Pr(B = Bo) ~ 48(Bo)(1 — £(Bo)) 


where (x) is the cumulative normal distribution function. The 5 per cent and 
1 per cent values of By are 2.237 and 2.806, respectively. 

But whether A or B is used, the fact that the remaining signatures of (3.1) 
are not considered makes the test inadequate in the following respect. Suppose 
the x,’s have assigned values. Let 4n be even and consider the power of the 
test with respect to alternatives (8, a) such that either 


(5.3) Lash < (ap — a) ‘(Be —- 6B) < Ua/441 


Lansa < (ao —_ a)/ (Bo -— | ) % T3n/4+1 - 





REGRESSION PARAMETERS 


The vector (8) — 8, a — a) is directed towards one of the four open regions 


r — 
+Hivee + 


3 


3 : 
Bites 4a” 


If the true parameter point (8, a) is sufficiently distant from (8 , a), the value 
A = 43nwill be practically certain to occur every time. So, when the significance 
level is less than Pr(A = 4n), the power of the test against such alternatives 
actually tends to zero as (8, a) recedes indefinitely from (8 , ao). While for large 


0-20 0-24 0-28 0-32 0.36 
B-B, 


Fia. 2 


samples Pr(A 2 }3n) is too small to matter, the effect may be important for 
moderate values of n. For example, when n = 16 we find, using the exact dis- 
tribution of 7; and r. , that the first significance level satisfying the above con- 
dition is Pr(A = 8.5) = 0.0152, which is not unduly small. The phenomenon 
is illustrated in Fig. 2, which shows contours of the power function of the A 
test in this particular case, when the residuals are assumed normal with unit 
variance. 


6. Power of the m test. The power of the m test is now discussed. Under the 
alternative (8, a), €¢; = y; — a — Bx, has probability 4 of being positive or nega- 
tive and 





506 H. E. DANIELS 


(6.1) Yi — & — Bory = €; — (ao — a) — (Bo — B)a;. 
Let 


(6.2) pi = Prie: > (ao — a) + (Bo — B)zi}, Ga=1— 7%. 


The probability P,,(m | a, 8) of rejection on the alternative hypothesis is still 
given by (4.2), but the random walk defining p,(t, mo) is now such that the ith 
step takes values +1 or —1 with probabilities p; and q;, respectively. A con- 
venient solution of the random walk problem with general p; , q; is not known, 
though in particular cases p,(t, mo) and hence P,(mp»| a, 8) can be computed 
by the direct step by step procedure which is not too arduous for moderate 
values of n. 

The problem can be solved simply when p; = p for all 7. This is the case if 
the alternatives are (8), a) and all the e,;’s have the same distribution. Then 
for all 7, 


(6.3) 
and by standard methods we find 


2D ( 7 \) 
(ft ao _ ' ( n ) Pn ( * ) 
ey, ey = PE ze At + j(n — 2mp) m, + j(n — 2m) f 


n—mo—1 


P,.(m | a, Bo) = 1 — ; Dn(t, Mo). 


t==emo+1 


(6.4) 


When n is large write 2 = (n — 2mo)/+/n as before, and put 


p= Al — p/n), 9 = (1 + w/o). 


The limiting form of the power function turns out to be 


x 


P,(mo | a, Bo) = 1 — > e*** (2; + 1m + wu) — &((27 — Ia + y)) 
(6.5) a 1 aiitideatis 
42 J el 70 md ma ne ¢ t “85 £ 
ft 2¢ . > => 
A V 2u 


which reduces to (4.5) when » = 0. If the ¢,’s have a common density function 


sl 
fle, pi = / fle) de~ 4 — (ao — a) f(0), and 


one 


(6.6) uw = 2(ao — a) S(O) Yn. 


It is easy to find the limiting form for large n of the power function of the 
corresponding Brown and Mood test against the same alternatives, using either 
the A or B statistic. The A distribution has the noncentral x* form with 2 degrees 
of freedom and parameter yu’, while 


(6.7) Pr(B = Bo | me, (So) ~ ] {96 ( Bo) — 1} {%(By - p) + &( By } ~~ 1}. 





REGRESSION PARAMETERS 


TABLE II 
Asymptotic power functions of four tests, at .05 level, for alternatives 
a ~£a,8 = B&B. Heren = Vn {1 — 2 Pr(e > a—a)!} 


A B Normal 


0 0.05 0.05 0.05 
0.790 0.10 0.10 ‘ 0.13 
1.316 0.20 0.20 16 0.29 
1.666 0.30 0.30 ~2e 0.45 
1.958 0.40 0.41 0.59 
2.226 0.50 0.51 0.42 0.7] 
2.493 0.60 0.61 0.52 0.81 
2.775 0.70 0.71 0.62 0.89 
3.104 0.80 0.8] 0.73 0.95 
3. 


-——- 


557 0.90 0.91 0.85 0.99 


Columns 1 and 4 were computed from Table of Noncentral x’, by Evelyn Fix, 
University of California Press, 1949, and “Charts of the power function for 
analysis of variance tests, derived from the noncentral F distribution,” by 
E. 8. Pearson and H. O. Hartley, Biometrika, Vol. 38, Parts I and II (June, 
1951). 


In Table II the large sample power functions of the m, A and B tests are 
compared for such alternatives. The m test turns out to be somewhat less power- 
ful than the A or B tests against these alternatives, as might be expected from 
the way the latter were constructed. The corresponding limiting power function 
of the standard F test under the normality assumption is also tabulated. For 
large n the distribution of F approximates to that of noncentral x° with 2 de- 
grees of freedom and parameter }ay’. 


7. Some generalizations. The m test can also be used for the parameters of 
the model 


(7.1) 7, = Bir), T Bote © 


under similar assumptions about the 2’s and the e’s. The lines in the (8), 8;) 
plane are 


(7.2) By, = (—2o;/%1;)Bo + yi/Xy 


If they are numbered in order of increasing 2»;/2,; , the argument goes through 
exactly as before. 
In principle the m test can be extended to the case of k parameters, though 


the distribution of m is not easily obtainable for k > 2. The model is then 


7.3) . Bia) + Boo, + 





508 H. E. DANIELS 


The n hyperplanes 


(7.4) ByX1i + Boag + e+ + Bates = Yi 


partition the k-dimensional parameter space into 1 + (7) + (2) + --- + (2) 
regions of which 2{1 + ("7") + ("r’) + --- + (22) are open and (";') are closed. 
This is proved as follows. 

Let 7’, be the number of regions into which a k dimensional Euclidean space 
is partitioned in general by hyperplanes (i.e. [k — 1] flats). Let O,., of these 
be open and the remaining C,,, closed. 

Consider the effect of adding one more hyperplane. It adds a new region for 
every existing region it intersects. The number of new regions added is there- 
fore 7’, x, since the new hyperplane is itself partitioned into 7',,,1 regions by 
the n existing hyperplanes. Hence Tnsin = Tax + Tara. 


Clearly 7), = 1 for all k. Also T,, = n + 1, so that we can take T,,o 
Let 


G2) = D Tan. 


kun 
Then Gy. (1 + \7,,(z). Since Go(z) 


Hence 
1 1 + 


The number of new closed regions added by the extra hyperplane is the same 
as the number of closed regions formed in itself by intersections with the n 
existing hyperplanes. Hence C,4., Cake + Caza. Since C 
can take C 1. Let 


n l we 


H(z) = ¥ Cre’. 


Then H,,4,' (1 + z)H,(z). We have C; 0 l and C; 0 fork = 1 
that H,(z) land H,(z) = (1 + z)"", and C,, (";'). It follows easily that 


the number of open regions can be written as 


a0 


On 2{1 


As before the sample signature is compared with the signatures of the open 
regions to find m. Note that the largest closed polyhedron is a confidence region 
with coefficient 


+ (%-i)} 


8. Repeated values of x. So far the possibility of repeated or “tied” values of 
« has been excluded. The presence of such tied z’s introduces a difficulty similar 
to that found with the Brown and Mood test. We returm to model (2.1) and 
consider the example for n = 6, illustrated in Fig. 3, where 7; = 2, Zs. 





REGRESSION PARAMETERS 


Fic. 3 


The lines corresponding to the three tied z,’s are parallel, and the numbers 
3, 4, 5 can be assigned at random to the three lines. With the ordering shown, 
the open regions of signature — — — +++, — — —-—++ have dis- 
appeared while the regions — — +——-+, ——++-—+ previously 
regarded as closed are now open. With the ordering 543 replacing 345, the con- 
jugate regions are the ones affected, while with any other ordering such as 435 
one open region on either side is replaced by a region previously considered 
closed. 

By assigning a random order to the tied z,’s the m test for (8 , a) may still 
be applied as before, even when such ties are present. However, there will 
exist significance levels such that, for alternatives (8, «) lying in the direction of 
the tied lines, the power of the test never attains unity no matter how distant 
the point (8, a) is from (8 , ao), since some previously closed regions must extend 
to infinity. To avoid this difficulty the test has to be modified by relabelling 
the open regions. 

Suppose the ,’s to occur in / tied groups with n; in the jth group and }*j n; = 
n. Thus 


. san . - 
= ia,+2 = * Tnytng ™ 


< Tn—ny +1 = «++ = Ty. 


Fig. 4 illustrates the case 1 = 3, ny = 3, nm = 4, ns = 2. The 9 lines produce 
only 36 regions in the plane instead of the full 46, and in general }nj;(n; — 1) 
regions disappear for each tied group of n,; lines. Of course, some of the values 
of y in a tied group may also coincide, but in such cases we regard the corre- 
sponding regions as being present but of zero width. The test statistic is again 
the minimum number m of lines to be crossed in escaping to infinity from the 
point (Bo , ao). ; 





H. E. DANIELS 


3 


mers 


a 


at 


3 
Fia. 4 


The open regions are of two types, (i) those lying between two successive 
bands of tied lines and (ii) those formed between tied lines of the same group. 
The signatures of type (i) regions are invariant under sampling (for fixed .,’s) 
while those of type (ii) are not. In particular there is one type (i) region for 
which every ¢; is positive. We shall write its signature in the contracted form 


(8.1) (+) (+) (4+) +--+ (+) 


where the jth bracket (+) stands for the n; plus signs corresponding to the jth 
tied group. The open regions may then be characterized in the following way. 
Starting at the region just mentioned, we describe a clockwise circuit of the 
open regions. As we move through the first group of type (ii) regions the first 
(+) changes successively into n, — 1 different sets of +’s and —’s, not all —, 
but the remaining brackets are unchanged. The particular combinations of 
signs for the ny — 1 regions will depend on the given sample, but in random 
sampling all 2"' — 2 combinations of signs which are not all + or all — are 
possible for these regions. On passing beyond them into the next type (i) region 
every e; fori = 1, 2, --- , m, becomes negative and the signature of this region 
may be written in contracted notation as 


(8.2) (—) (+) (4) «++ (4) 


We introduce the symbol (~) to denote any set of signs whatever for the 
«;’s of a particular tied group of lines. Then the signature of any of the regions 
so far considered is included in the formula 


(8.3) “(mw) (ee) (4) + © (4) 





REGRESSION PARAMTTERS 511 


Therefore, this may be called the signature of the whole region lying between 
the last line of the /th tied group and the first line of the second tied group. 
Similarly 


(8.4) (—) (~) (4-) (+) --+ (4+) 


characterizes the region extending from the last line of the first tied group to 
the first line of the third tied group. The two regions (8.3) and (8.4) are of course 
not disjoint, since both contain (8.2). Proceeding in this way we can cover all 
open regions by the following overlapping set of regions arranged in | conjugate 
pairs: 
~) C4) C7 +> CERES) Gag (7 (7) «>> (=) (=) 
(=) '(~)- G+) = 2 CoG) C+) (—) --- (—) C) 


(Me HA HA ™ + - (-) 


|! — 1 (—) (-) (—) «++ (R) (+) SO(4) (4) (4) «ss () (-) 


b (=) GH) C) + C=) Oe) OD. 4) = GH) 


9. The modified m distribution. Suppose the sample signature under (8o , ao) 
consists successively of r; +’s out of nm, r2 +’s out of ne, --- r, +’s out of n,. 
No ordering of the signs within tied groups is necessary. The numbers of dis- 
agreements of sign with the jth pair of conjugate signatures in (8.5) are d,, dj, 
where dj + d; = n — n,, and 


j 


/ dj=n+ r+ os Ong  Uisen = Thal 
(9.1) 


+ (Ryse — Vy4e) T °°?  ( — fi). 
There are no disagreements of sign with the jth group, by definition of (~). 


We require the distribution of 


(9.2) m = min min (dj,n — nj — dj). 
7 


It is perhaps unfortunate, when there are no tied 2’s, that d; does not reduce 
to the previous ¢; since the characterization of the open regions is different, 
but the present method seems most expeditious in the tied situation. On writing 
s; = 2r; — n;, (9.1) becomes 


(9.3) d; = 3(n — nj) + 4(8 + 8 + ee) H+ Bj — By4y — Bjae — Os — 8) 
and 

Pr(m > mo) = Pr(m < dj <n — nj — m,j = 1, 2, - 
= Pr{ | 8 + 8 + +--+ + 8) — 8j41 — Sjae — 


<n—n;j — 2m,j = 1,2, --- lL}. 





512 H. E. DANIELS 


The special case where | = 2 and the tied z’s are respectively —1 and +1 is 
of interest. The model is 


( 


la~- 6 + ¢, 7=1,2,°°:,m, 
4 


| ; ° < ' 
iar pT ¢;, J=m+t1,m+ 2,°°',m+ nm. 


> ’ 


Yj 


We are simultaneously testing whether the medians of two independently 
sampled populations are respectively ag — By and ap + 8). Then (9.4) reduces to 


Pr(m > mo) = Prt | 81 | < mm. — 2mo, | 8 | < m2 — 2mo} 


| 2 


and we arrive at a particular kind of simultaneous sign test. 


TABLE III 


Values of P,:(m») for modified m test with 1 groups of v tied z’s 


1.0 

736 985 

40 671 

117 005 =. . S8E ‘ 0 

034 . 109 2k . ° 984 1.0 

009 033 ‘ ’ 745 .950 1.0 
.022 ooo. Ct; . 414 -675 .901 


In general it is not possible to derive a convenient formula for (9.4), but 
particular cases can be evaluated by reformulating (9.4) in terms of a random 
walk with rather unusual boundary conditions and proceeding step by step. 
Write w; = 8; + 8 + --+ + 8; and let d be the total number of negative signs 
in the sample signature. Then if wo = 0, 


(9.5) d; = d — 34n; + 3(wyi + w)). 


{ 


Also w, n 2d, and for m > mo, d can range from m + 1 ton 


Hence (9.4) becomes 


I 


Pr(m > mp») 7 Primo + 4nj; < d + wei + wi) <2 — m — 
(9.6) damot 


d+w,=n-— d} 





REGRESSION PARAMETERS 513 


We therefore consider the following random walk. Start at the point d and 
proceed in steps of s; which can take values 2r; — n; with probabilities 2-"’(*/). 
Absorption on boundary points at mo + 4n;, nm — mo — 4n; occurs when the 
midpoint of the step falls on or beyond these points. Thus it is possible for the 
path to overshoot the boundaries to some extent but not to stay outside for 
more than one step. The probability of arriving at the point n — d after] steps 
is computed by enumerating the appropriate paths using a typical ‘binomial 
triangle” technique. Summation over d then gives Pr(m > mp) 1 — P,,(mp). 
The distributions for equal groups of 2 and 3 given in Table III were computed 
in this way. 

The case mp = 0 can be handled directly by observing that the open regions 
of type (i) account for 2/ signatures while those of type (ii) can have 
2 > 1(2"4 — 2) possible signatures; the largest closed polygon is therefore a 
confidence region with confidence coefficient 


l 
P.0) =1- =, (> 2" — ‘). 

We finally remark that even when some values of x are not completely coin 
cident, the corresponding lines in the (8, a) plane may be so nearly parallel that 
one would intuitively prefer to use the test which treats them as tied (though 
they would be kept distinct in calculating the e’s). But a rule for deciding between 
the two tests in such cases would have to depend on a comprehensive comparison 
of their power functions. 


10. Acknowledgments. I am much indebted to W. H. Kruskal for many 
useful comments, and to W. Goldfarb and G. Chow for computational as- 
sistance. 


REFERENCES 

{1] G. W. Brown anp A. M. Moon, ‘On median tests for linear hypotheses,’ Proceedings 
of the Second Berkeley Symposium on Mathematical Statistics and Probability, 
University of California Press, 1950, p. 159-166. 

[2] H. E. Dantens, “The theory of position finding,’ J. Roy. Stat. Soc. (B), Vol. 13, No. 2 
(1951), pp. 186-207. 

[3] W. Fevuer, An Introduction to Probability Theory and Its Applications, John Wiley and 
Sons, 1950. 

[4] A. M. Moon, Introduction to the Theory of Statistics, McGraw-Hill, New York, 1950 

[5] H. Tuer, ‘A rank-invariant method of linear and polynomial regression analysis,’ 
Indagationes Math., Vol. 12 (1950), pp. 85-91, 173-177, 467-482. 





ON THE ASYMPTOTIC EFFICIENCY OF CERTAIN 
NONPARAMETRIC TWO-SAMPLE TESTS 


By A. M. Moop 
The RAND Corporation 


1. Summary. In this paper the following asymptotic efficiencies are computed 
for the given two-sample tests against normal alternatives to the null hy- 
pothesis: 

rank test for location 


3/m 95 % 
median test for location 2/r 


64% 
run test for location 0 
run test for dispersion 0 


square rank test for dispersion 15/2e = 76% 


Also, general expressions for means and variances of some of these test criteria 
are found for distributions alternative to the null hypothesis. 


2. Asymptotic power of a test. Let 7, be a statistic which is a function of n 
sample observations 2; (with 7 = 1, 2, --- , n) from a population with distribu- 
tion F(x; 0). Let the mean of 7, be u,(@) and the variance of T,, be o%(0), and 
suppose that 7’, is asymptotically normally distributed for all @ in a neighbor- 
hood of @ . Let h,(@) be the power function of 7, for testing the null hypothesis 
that 6 = @ . For large samples 


(1) hn(0) = Pl| Tn — un(Go) | > kon(Oo)] 


Hn (89) +ho (Oo) og A 2 92 ' 
(2) h,(0) ~1— / exp{ —[y n(O)|"/20°,,(8) } j 


iz 
J an (00) —kon (80) V 2x o,(6) 


Ly, 
where k is determined by the chosen significance level a of the test. 

We suppose further that y»,(@) and its first two derivatives are of order one 
in the neighborhood of 4 and that o,(@) and its first two derivatives are of order 
1/+/n in the neighborhood of 6 . Then it is evident from (2) that h,(@) is es- 
sentially unity except in a neighborhood of % ; of course h,(@)) = a. On evalu 
ating the first two derivatives of h,,(@) at 6) one finds that the coefficient of the 
term of order Vn vanishes in h’.(00), so that it is of order one, and also that the 
significant term of h®(@) is of order n. Thus in the neighborhood of 6 we have 
essentially (letting @ represent the normalized normal density function 


kek) aoe : 
(3 h,(0) =a + — 0 — 0)” 
0) tn(O) om of (6) (se ( ) 


when |@ — @&| < 1 Vn. We are interested in (3) because it enables us to 
evaluate the asymptotic efficiency of certain tests without having to evaluate 
the variance of the test criterion under the alternative hypothesis. In fact, if 


Received: 5/19/52, revised 5/5/54. 





NONPARAMETRIC TESTS 515 


m* . * . *2 * - . ° . . 
I’, With mean w,, Variance o, , and power A, is the best criterion for testing 
6 = 6, the asymptotic efficiency of 7’, is defined to be 


’ 1 dup, s 1 :) 
( ort S “#279 \ ao 
t) ae (2) / on (00) \ dd)” 


the limiting ratio of h,(@) — ato h*(@) — a using (3). This definition is consistent 
with Fisher’s definition of efficient estimators. 

For one-sided tests h,(@) is essentially linear instead of parabolic, 

ahs o(k) dun 
h,(@) = & + a», (80) dB, (0 = 6), 
in the neighborhood of 6) . Thus the ratio of h,(@) — a to h*(6@) — ais the square 
root of (4). For this reason (asymptotically) tests which are not 100% efficient 
are less unsatisfactory for one-sided tests than for two-sided tests. A more de- 
tailed discussion of asymptotic efficiency will be published by Dwass [4]; see 
also Levene [11] and Noether [13]. 

[In the application of (4) to the various statistics investigated in this paper, 
we require something akin to asymptotic normality of the statistic, uniformly 
in a neighborhood of the null hypothesis. Generally, as pointed out by a referee, 
proofs existing in the literature of asymptotic normality do not provide this 
strong a result and the validity of our computations is not completely justified. 
However, a careful analysis of power functions by Andrews [1] shows how in 


such cases the usual limiting distributions may be obtained for certain sequences 
of alternatives to the null hypothesis. Within this framework, at least, the 
computations are valid.] 


In the evaluation of various nonparametric tests to follow, we shall refer to 
normal distributions so that 7* is well-known. In this sense, the results are 
quite specialized. However, the intent of the computations is to furnish evidence 
for a more general evaluation. The choice of the normal distribution puts the 
nonparametric tests in competition with ¢ and F tests, which are known to be 
most powerful for that distribution. If a nonparametric test looks bad relative 
to the ¢ or F test (assuming normality), then one can observe only that the test 
should not be used when there is assurance of sensible normality. However, a 
nonparametric test which is found to be nearly 100% efficient relative to the ¢ 
or F test has much to recommend it—not much is lost by using it even in the 
case of normality and it is a reasonable presumption that the test will behave 
approximately as well (as the ¢ or F test) for other distributions. Thus such a 
test enables one to avoid the assumption of normality at negligible cost when 
one has little knowledge of the shape of the population distribution. 

On the other hand, one must expect that a nonparametric test which com- 
pares favorably with the ¢ test, for example, assuming normal distributions, 
probably will be unsatisfactory for distributions for which the ¢ test is known 
to be poor. 





516 A. M. MOOD 


3. Median test for two samples. The median test [2], [18], [19] of whether two 
populations have the same location (assuming they are otherwise the same) 
consists of testing whether the number u of z observations to the left of the 
joint median z of the two samples differs significantly from half the total num- 
ber of « observations. This test applied to normal distributions with the same 
variance will be shown in this section to have an asymptotic efficiency of 2/7; 
hence it is of much less interest than the rank test (Section 5) in dealing with 
essentially normal distributions. 

We consider samples of m x’s and n y’s from populations with distributions 
F(x) and G(y), but to make an unessential simplification we suppose m + n 


2r + 1. The joint distribution of u and z is then given by 


H(u, z) = m(t") (LD) F*(2){1 — F(2))" "(21 — G(2)]" “dF (2 
(6) 
+ n(™) (2) F*(2)1 — F(e)|"*G"*(2)[1 — G2)" ** "dG (2), 


where the first term arises when z is an x observation and the second when 
is a y observation. In this expression we put 


yy / J : = 
(7) = mi'(c) + Vm, 2 Cr W /m, 
where c satisfies 

(8) mI (c) + nG(c) = (m + n)/2. 


Then we use Stirling’s formula on the factorials, take logarithms, and neglect 
terms of order 1/+/m in the usual fashion to find that » and w are asymptoti- 
cally normally distributed. The quadratic form is found to be 


2 1 m = 2n0) Jf eb? sl 
: E <i a2 7 | ee iFG-P Ga-® 


2 r ng 
lin E >A ao = a | 


in which f and g are the first derivatives of F and G, and all four functions are 


(9) 


evaluated at c; we must assume the derivatives do not vanish. 

When F' = G, the middle term of (9) vanishes so that v and w are independently 
distributed and we easily find the variance of v to be n/4(m + mn) under the 
null hypothesis. The variance of u, when F = G, is thus mn/4(m + n). 

Now we put 


l 9 ‘ l —(z— fo 
— = € 4 g(x) = - a 6 0° 
V/ 24 V 29 


(10) f(x) = 


’ 


to compute the asymptotic efficiency for the normal case. We must first evalu- 
ate, to use (4), 

d af % de 
(11) = — mF(c) mf (c) 7 
( Ss 


dé 





NONPARAMETRIC TESTS 517 


at c = 0. There is no closed solution of (8) for c in terms of £, but we are inter- 
ested only in the neighborhood of § = 0 where 


n 9 

—— §+ O(¢’). 
m+ n® 7 
The value of (11) is then mn/+/2x(m + n). The asymptotic efficiency of the 
median test for location for normal populations is therefore 


(12) eh, ( mn )/ 1 ae) 4 
= mn/4(m + n) /In(m + n) 1/m + 1/n \dé a 


the same as given by Cochran [3] for the sign test. 


4. Rank test for dispersion. Since the rank test (Section 5) is so successful 
in testing for location differences, it is natural to inquire into the efficiency of a 
rank test for dispersion. Let m observations from a population distributed by 
F(x) and n observations from a population distributed by G(y) be ranked from 
1 tom + n, and let W be the sum of squares of the deviations of the y ranks 
from the average rank, 


(13) w=-> (r. «% ety, 
t=] “« 
where r; is the rank of the 7th (in order of magnitude) y observation. If the z’s 
are dispersed relative to the y’s, W will be relatively small. 
We first obtain the mean and variance of W when F = G by means of a 
generating function which automatically performs certain required sums, 


m+n 


(14) g(t, 2, y) = IT {yh or + a}. 

i=l 
The probability P(W) for a particular value of W is the coefficient of ¢” in the 
coefficient of «”y" divided by (2*"). Let us denote m + n by s and the exponent 
of ¢ in (14) by ci . We now differentiate (14) with respect to ¢ to get 


(15) a | TT Gut? + 2). 


dl i=1 yfi + gd i=i 


Putting ¢ 1 we find the coefficient of x”y” in y(y + x)” - ci/(m) is 


(16) E(W) = n(s + 1)(s — 1)/12. 


For the variance, differentiating (15) again with respect to ¢, putting ¢ = 1, 
etc., we get: 


Eww — 11 = (Sd — He] + Cl cd’ -— Sei} /C) 


n(s + 1){3m(3s° — 7) + 5(s — 1)[s(s + 1)(n — 1) -- 12]}/720. 


(17) 


From this, under the null hypothesis 


(18) ow = mn(s + 1)(s + 2)(s — 2)/180. 


' 





518 A. M. MOOD 


Now it is necessary to obtain E(W) when F # G. We divide the x axis (on 
which both populations are now assumed to range) into small intervals Ax, 
(a = —o,-+--(@, 1, 2,--- ©) and suppose the distributions have densities 
f(x) and g(x) so that the probability an «x observation falls in Ax, isp. = f(xa)Arao , 
where 2, is in Az, . Similarly for y observations, ga = g(ra)Ara . 

Let i, be the number of x observations in Az, and j, be the number of y ob- 
servations in Az, (the Az, are chosen sufficiently small that the probability of 
more than one observation in a single interval is negligible). We let Az,, denote 
the interval which contains y;, ; then the rank r, of y; is 


(19) ro= D> (ia + ja) 


a=—o 
Now let 


6 


(20) Js = js Lo (ia + je). 


a~=—7 


It is clear that J, = 0 if Azg does not contain a y, and is equal to the rank of y 
if Azg does contain a y. We have then 


E(W) = ely (: a myatty' 


t=1 2 


= Bl Dr —(m+n-+ 1) dr, + n(m + n+ 1)* | 
bond 


i=l 
= El Dui —(m+n-+ 1) ade +n(m+n-+ 1)° 1) 


In terms of the p, and qa 


8 


s-1 
E(> Js) = lim z nga 2 MPa + E(>> js) + > > n(n — 1)qa 4p 
4z,-9 8 a= 8 a=—ax 


(22) 


mn | F(x)g(x) dx + n + (2). 


oo B 9 
E(> J5) = lim E > iil > (i. + in | 


Aza-9 B=— 2 1 m= — 0 


8 8 


= lim EH > i > 7 (ig ty + tajy + jaty + Jajr) 
B 


a=a—O y=—O 


=limE > jj (m* Da Py + MPa bay — MPa Py 


B,a,¥ 
+ Mpajy + MP, ja + jajy\ 


= lim y» {(m" pa Py + MPabay — MPa Py)[n'gs + nqs(1 — qa)] 


B,a,¢ 





NONPARAMETRIC TESTS 
2MPa \bya(2n~ qs + ngs) + n° gy qa + 


(4 2 (3 \ © 4 
NN Gady 9B +N Gay Ie + bay(n 


nda Ga +n dads) + 26,3(2n'” qu y5 


¢ . » (8) 3 ee 6 ) 
- 543 5y8(5n qe tr én qs tT Ms); 


nm | F? dG 4 3mn | F dG + 2n°"'m 
PURE ce 


Putting the final forms of (22) and (23) in (21) gives 


E(W) = (n/12)[3(s + 1)° — 6(n + 1)(s + 1) + 2m 


(24) [ : ‘ 
+ mn} (m — 1) | F* dG + 2(n — 1) RG dG 
L 


We are now ready to compute the efficiency of W relative to the standard / 
test using 


Tr if 
(25) 2p = % giv) = —f ‘ 
V 2n a a 


We assume the asymptotic normality of W uniformly in a neighborhood of 
a | in order to carry out the computation. In the sense of Hloeffding [5], W is 
a U’ statistic; hence Hoeffding’s general result on the asymptotic normality of 
U’ statistics can be applied to W in just the same way as by Lehmann |9} in 
establishing the normality of the rank test criterion for a fixed parameter value 
However, we need more than this and must refer to Andrews |1| for a rigorous 
discussion of large sample power functions to justify such computatior 

On substituting (25) in (24) and differentiating with respect to o, we obtain 


dk(w) rr eo ; 2 
= mn < (m ) f @ - IF dF + 2n 1) f (2° Lh 


a 1) | F({ (¢° — 1) ary) ) aP(a) > (g 


The third integral may be written (n — 1) [ow Na 


d ¢( | ") 9? . , > 
! | = mn\s — 2) | ( - | )\F dik (xr 
do d 


= mn(s — 2) / 2rv/3, 


using the evaluation of the integrals given by Jones [7| 





520 A. M. MOOD 


For the F test we have u*(o) = 1/o° and the variance under the null hypothe 


sis IS oF 2(m + n)/mn. Substituting these results together with (18) and (27 


into (4), we find the asymptotic efficiency of W to be 


inn 180 mn(m + n) | mn 9 
28) -|~ 7 2 
mn(m +n)? 2973 2(m + n) 


which is about 76%. For one-sided tests, the square root of this figure is about 
87%. 


5. Rank test for location. Van der Vaart [16] announced that the Wilcoxon 
(20| test for location had an asymptotic efficiency of 3/2. However, Pitman |15} 
seems to have priority’. Since there is no readily accessible derivation of this 
result in print, it is perhaps worth a brief computation here. Using the same 
notations as before, let U, the test criterion, be the number of times a y precedes 
an 2 in an ordered sample of m z’s and n y’s. Then 


a 


(29) Ue > jn 2, be. 
A =—O Bea 


lis expectation miy he found easily by the method of the previous section. In 


this case, however, the result is obtained immediately by Mann and Whitney 
{12] to be 


wo 


(30) E(U) = mn | [1 — F(x)|g(x) de. 


Under the null hypothesis, 07 = mn(m + n + 1) / 12. On putting (10) in (30 


we find du/dé mn/2/ mr. The asymptotic efficiency is 


31 12 ( mn if l (<) 3 
(, aa en --_- wa = 
; mn(m + n+ 1) 2) x 1/m +- 1/n dé r 


Lehmann and Stein [8] have shown that Pitman’s [14] randomization test is the 
most powerful nonparametric test for normal alternatives. Its asymptotic ef 
ficiency is shown by Hoeffding [6] to be unity. Dwass [4] has shown that the 
most powerful rank order test also has unit asymptotic efficiency 


6. Run test for location and dispersion. Pitman [15] found that the Wald 
and Wolfowitz [17] run test has zero asymptotic efficiency for testing either 
location or dispersion. These results may be computed at once by putting (10) 
or (25) in Wolfowitz’s [21] expression for the expected total number of runs 


rp“ 2mn f(x) g(x) 
. . dx 
om f(x) + n g(x) 

' These results are given in ‘ecture notes prepared for a course at Columbia University 
given in 1948; several copies were distributed to various statisticians in the U.S. but no 
other copies are available. The author borrowed a copy after being informed of its « 
by a referee. 


istence 





NONPARAMETRIC TESTS 


to find du/dé or du/do. Then (4) may be used with the / or F test for 7% . In this 
connection, I. R. Savage has raised a question about the proper use of the asymp 
totic normality of the run criterion; see Lehmann {10} 


7. Lehmann’s rank test for dispersion. Another test for dispersion described 
by Lehmann [9| consists of forming all the (2) positive differences between the 
m observations on x and the (.') positive differences between the n observations 
on y; the rank test is then applied to these differences. Let V be the number of 
times x differences exceed y differences, and let the x and y populations have 
densities f(x) and g(y). A tentative value of 27/42 (about 68%) has been com 
puted for the asymptotic efficiency of V relative to the standard F' test, using 
methods analogous to those of Section 4. 


However, V is not completely distribution-free; its distribution depends on 
the form of f(x) even when f = g. The mean is independent of the form 


E(V | f = g) = m(m — 1)n(n — 1)/8 


/ ’ 


but I have been unable to show that the variance of V or even the large sample 
variance is independent of f(x). The calculation of the value 27/42" used the 
unproved assumption that the asymptotic variance is independent of f when 
f g, in that numerous eight-fold integrals were evaluated using the exponential 
instead of the normal distribution. 


REFERENCES 
{1] F. C. Anprews, ‘‘Asymptotie behavior of some rank tests for analysis of variance,”’ 
Ann. Math. Stat., to be published. 
G. W. Brown Aanp A. M. Moon, “‘On median tests for linear hypotheses,’’ Proceedings 
of the second Berkeley Symposium on Mathematical Statistics and Probability, Uni 
versity of California Press, 1951, pp. 159-166. 
W. G. Cocuran, “The efficiencies of the binomial series test of significance of a mean 
and of a correlation coefficient,’’ J. Roy. Stat. Soc., Vol. C, Part I (1937), pp 
69-73. 
M. Dwass, “On the asymptotic normality of certain rank order statistics,’’ Ann. Math 
Stat., to be published. 
5} W. Horrrpina, ‘A elass of statistics with asymptotically normal distribution,’’ Ann 
Math. Stat., Vol. 19 (1948), pp. 293-326. 
5] W. Hoerrpina, ‘‘Large-sample power of tests based on permutations of observations,” 
Ann. Math. Stat., Vol. 23 (1952), pp. 169-192. 
| H. L. Jones, ‘‘xact lower moments of order statistics in smal! samples from a normal 
distribution,’’ Ann. Math. Stat., Vol. 19 (1948), pp. 270-273 
I}. L. LeuMaNnn AND C. Svein, “On some theory of nonparametric hypotheses,’”’ Ann 
Math. Stat., Vol. 20 (1949), pp. 28-46 
KE. L. LeamMann, ‘“‘Consistency and unbiasedness of certain nonparametric tests,’ 
inn. Math. Stat., Vol. 22 (1951), pp. 165-180. 
I}. L. LEHMANN, ‘“The power of rank tests,’”’ Ann. Math. Stat., Vol. 24 (1953), pp. 23-43. 
H. Levene, “On the power function of tests of randomness based on runs up and 
down,’’ Ann. Math. Stat., Vol. 23 (1952), pp. 34-56 
H. B. Mann anu D. R. Wurrney, “On a test of whether one of two random variables 
is stochastically larger than the other,’’ Ann. Math. Stat., Vol. 18 (1947), pp 
50-61. 





A. M. MOOD 


NOETHER, ‘‘Asymptotic properties of the Wald-Wolfowitz test of randomness,”’ 
, pp. 231-246 
1G). Prem an, “Significance tests which may 


tan. Math. Stat., Vol. 21 (1950 


be applied tosamples from any popula 
tions,’ J. Roy. Stat. Soc., Suppl. Vol. 4 (1937), pp. 117-130 


15| Ik. J. G. Pitman, Lecture notes on nonparametric statistical inference, 1949 


116) H. R. Van per Vaart, ‘Some remarks on the power function of Wilecoxon’s test for 


the problem of two samples,’’ Nederl. Akad. Wetensch., Proc., Ser. A., Vol. 53 

1950), pp. 494-520 

17) A. WaLp ann J. WoLrowirz, “On a test of whether two samples are from the 
population,’’ Ann. Math. Stat., Vol. 11 (1940), pp. 147 

IS] J. WesTeENBERG, 


Ame 

162 

“Significance test for median and interquartile range in samples 

from continuous populations of any form,’’ Nederl. Akad 
A., Vol. 51 (1948), pp. 252-261 

19| J. WesTeNBERG, ‘ 


Wetensch., Proc., Ser. 


A tabulation of the median test for unequal samples,’’? Nederl. Akad 

Wetensch., Proc., Ser. A., Vol. 53 (1950), pp. 77-82 

Witcoxon, ‘Individual comparisons by ranking methods,” 
(1945), pp. 80-83 


21) J. Wotrowrrz Proceedings of the Berkeley 
Symposium on Mathematical Statistics and Probability 


Press, 1949, pp. 93-113. 


[20| F Biometrics, Vol. 1 


“Nonparametric statistical inference,”’ 


University of California 





ON CERTAIN CONFIDENCE CONTOURS FOR DISTRIBUTION 
FUNCTIONS 


By Sten MAtmQuist 
Institute of Statistics, Uppsala, Sweden 


0. Summary. By a generalisation of a theorem by Doob, certain confidence 
or acceptance contours for distribution functions are obtained. The power of 
tests based on such contours is briefly discussed and some approximate results 
derived. Using the aforementioned generalisation of Doob’s theorem, the limit- 
ing joint probability distribution of the coordinates of the maximum deviation 


between a sample distribution and the corresponding parent distribution is 
evaluated. 


1. Introduction. Let /'}(U) be the empirical distribution function for a sample 
of N mutually independent observations on a statistical variate with a continu- 
ous distribution function /(U). Consider the process 


Xy(U) = [Fx(U) — F(U)|VN. 
Given FX(U), the probabilities for events such as 
(A) G,|F(U)| s Xx(U) Ss GAF(U)), 


for all U, can be used for testing the hypothesis that F(U) is a given function. 
The situation is illustrated in Fig. 1. The interval AB is called an acceptance 
interval, the interval CD isa confidence interval. Allowing U to vary, acceptance 
and confidence regions are obtained (cf. [3], p. 515; [13)]). 

Probabilities for an event such as (A) have the attractive property of being 
independent of the distribution function F(U). One may suppose that F(U) 
U, withO s U s$ 1. 

The limiting distributions 


(B) lim Py{Xs(U) $ GAU),OS US (one-sided alternative), 


N 0 


(C) lim Py{G,(U) = XU) $ G,{U),0s Us (two-sided alternative), 
N-*@ 

where the inequalities within brackets should be fulfilled for every U, have been 

derived by Kolmogorov [7] for G,(U) —a and G,(U) a. Related problems 

have been considered by Smirnov [12]. 

Doob [5] has demonstrated the Kolmogorov and Smirnov theorems by replac- 
ing the process Xy(U’) by a Gaussian process Y(U) with the same correlation 
function. This transformation has been justified by Donsker [4]. The process 
X(U’) is then transformed to a Wiener process W(t), and limiting probabilities 
concerning Ny(U) are transformed to probabilities concerning W(t), 


23 /53 





STEN MALMQUIST 


“F(U)=t 
Fic. 1 


For finite N, probability distributions involving confidence contours G(U) = a 
have been tabulated by Massey [10], [11] and Birnbaum and Tingey [2]. A gen- 
eral method for calculating distributions (C) above has been developed by Ander- 
son and Darling [1], (ef. [6}). 

For a given U, Fy(U) follows the binomial distribution and we have, supposing 
F(U) U withO s U $1, 


{{Xy(U)} = 0, E{Xy(U\)Xx(U2)} = Uidi — U3), 0< U,; 


U2 < 1. 


In particular, E{X\(U)} = UG — U) for0 < U < 1. This shows that the 
variance decreases towards the tails of the distribution. Thus, in constructing 
the contours G\(U) s X,y(U) Ss G,(U) it seems reasonable to let the width 


G.(U) ~— G,(U) decrease towards the ends of the distribution (cf. section 5). In 
absence of general principles in this respect, the form —G,(U) G.(U) 
aVWU( — U) with exclusion of the points U = 0 and U | has been suggested 
by Anderson and Darling [1]. 
If, for example, 

(a —b)U + b 
(D)) G(U) 

(b ajU +a 


then deviations of Xy(U) at the extremes of the distribution will have a greater 
chance of being detected. Or, the width of the confidence contours can be made 
smaller at the tails than at the middie of the distribution. Naturally, here also 
general principles for the choice of a and b are lacking. 
We may also be interested in the deviations Xy(U) for a certain part of the 

distribution /(U). That is, G.(U) should for example be of the form 

al’ +b AsSsUSB 
(1) G(U) 

(1— U)VN elsewhere. 





CONFIDENCE, CONTOURS 


In Section 3 below the limiting probabilities for inequalities of type (B) are 
evaluated for the form of G(U) indicated by (D) and (FE). For the derivation 
of the probabilities in Section 3 we use generalisations of theorems proved by 
Doob [5] concerning the Wiener process. These generalisations, proved in Sec- 
tion 2, follow very simply from the symmetric property of the aforementioned 
Gaussian process X(U). The theorem proved in Section 2 is in agreement with 
the fact that the Wiener process is continuous with probability one and may be 
approximated by a discrete Gaussian process with a corresponding correlation 
matrix. 

In Section 4, a numerical example is given of upper and lower limits for the 
power function in the case when one of the earlier derived limiting probability 
distributions is used for testing a specified normal distribution against a certain 
other normal distribution. The limits, which are capable of improvement, are 
rather wide and indicate for the example chosen a relative power of about 60 
per cent, compared with the most powerful test. This comparatively low value 
is not surprising, considering the general nature of the testing procedure used. 

From the aforementioned properties of the Xy(U) process it follows that a 
large deviation between the empirical and theoretical distribution functions 
F'y(U) and F(U) is more probable in the middle of the distribution than at the 
extremes. It may therefore be of interest to consider both coordinates of the 
maximum of Yy(l’). The joint and conditional distributions involved are de 
rived in Section 5. (Corresponding expressions for W(t) are given by Lévy [8], 
chap. 6.) 


2. Generalisation of a theorem by Doob. Let W(/),0 <1 < «, bea Gaussian 
process for which 


Pr{ W(0) 0} l, E{W(t)} 0, E{\W(s)WO} = 8, s st. 


The process W(t), called the Brownian movement process or the (normalised) 
Wiener process, has uncorrelated, and thus independent, increments. 
For the Gaussian process W,,(l’) defined for t! = t — ty by 


W,() = WW) — Wh), 


we have E{W,,(U)} (. Further, for t < ‘. 


EL W,.(t)W,.(ts)} = EVW(t + te) — Wt) [Wits + bo) — W(to)}} 


ly ty 
Thus, W,,(t’) is also a Wiener process. 


It has been proved by Doob [5] that 


(1) PriW(t) < at + bi} i=_g™. az0,b>O0;allt0 S51 < «, 


a 


The inequality within brackets must be fulfilled for every (in the given interval, 
The following generalisation will be proved. 





526 STEN MALMQUIST 


‘Tueonem 1: Let the process W(t) pass through the points |x, %} and \y, 8}, 
with « S y. Then 


Pr {WW < at+ 6,2 StS y| Wa) = %, Wy) = 5) 
bm exp (2H, Pian Pam 
itG =f afe afe | 
where R = Wx/y, and 
& S P, = ax + b, & S P: = ay + b. 


For the proof, we shall use the following transformations due to Doob [5]. 
First, a process X(U) is defined by 


(2) X(U) = + ,WM, OSt<#; Us iz » OSUS1. 
For the Gaussian process X(U) we then have 

E{X(U)} = 0, E{X(U)X(V)} = U(l — V), 0OsUsV 
Further, if U,; = 1 — U and V; = 1 — V, then E{X(U;)X(V;)} = 
K{X(U)X(V)}. Making use of this symmetric property of the X(U) process, 
we have 
Pr{X(U) s f(U),0< Us U'|X(U') =23 

= Pr{X(V) sf -—U),1-U's U <1 

where f(U’) 2 2’. By applying the transformation (2) we obtain 


: ), O<tst| Wt) = 4) 


1+ 


- Pry W(0 < (t+ vs(, od, : si<« w (3) “ * , 


where’ = U’/(1 — U’) and 3 = 2’(t’ + 1). In the case when f(t) = (a — b)t + |, 
we have 


t : l 
(t + ms (, + ) = al + b; (t + mr (, + :) = bt + a, 


Pry W() Ss (t+ vy ( 
(3) 


and 


Pr{Wd) sat+b0<ts |W) = «} 


(4) | | a, | 
=PriWit)) sbt+a,,st< 2 W ( ) "= 
\ t i 


Using the fact that W(t) — W(1/?’) is also a Wiener process, we further have 
PriW(t) < bt + a,1/( St < ~ | W(I/t’) = 8,/t’} 


= Pr{W(t) s bt + b/ +a — &/,0 <i < ~} 





CONFIDENCE CONTOURS 


Thus from (1), (3), and (4), 
Pr {WW sSat+b, O<tst| Wt) = «!} 


prow smut 


7", 0O<t< »} 


P, — 8, | 


1 - exp { —2h 7 f? & S Py = al’ + b. 
Further, according to the above mentioned property of the W(t) process 
PriW(t) S at+b,2 Sts y| Ws) = 81, Wy) = 8} 
= PriW@® < at+ar+b—8, O<tsSy—2|Wly —2z) = » — 83}. 
Finally from (5), taking t’ = y — aand 8, = &® — 8}, 
Pr{W@® sSat+b, «Sts y| Wr) = 8,, Wy) = 89} 
1 — exp {—2(ar + b — 8,)(P2 — 82)/(y — x)} 
Pi-%iP2—% R \ 
Gs ve tHe 


where R Vx/y with P; = ax + b 2 s, and Py = ay+b2 m. 
The same method can be used to generalise the result in [5] concerning two- 
sided probabilities. The following result, obtained by using formula (4.3), 


p. 398, of Doob [5], will be given without proof: 


Py =Pr{|WO| sSat+b|, rstsSy| Wa) =a, Wy) = a} 


x —2R 
=|-— x (exp ia mq R)\/xy —_ DP, _ 8 |((2m ~- LP. — nl} 


—2R ) 
ele ~ i - ie 2 4. ‘el 
+ exp i R)/xy ™ DP, + s|[(2m IPs + sl, 

—2R 
(1 — R°*)V ry 
2R 


> 


1— R)V ry 


| 
— exp ((2mP, — 8)(2mPs + 82) + 8 89] ? 


~ Gp ( 


[((2mP, + 8;)(2mP, — 8) + 8 ul) 


with R = Vx/y while P; = ax + band P, = ay + b. 

Finally, a remark concerning the one-sided and two-sided alternatives. Let 
a be the event that W(t) passes G,(t) and b the event that W(t) passes @,(t). 
Then 

PIG (t) < W(t) s GAt)} = 1 — Pla + b} = 1 — Pla) — P(b) + Plab) 
\ccording to the correlation properties of the W(t) process, we have (ab) < 


P(a)P(b). Then for small values of P(a) and P(b), 


1 P(a + 6) ~ 1 — P(a) — P(b) 





528 STEN MALMQUIST 


Even if P(a) and P(b) are moderately small, P(ab) may be neglected. In that 
case the probability for a two-sided alternative may be computed from the prob- 
abilities for the one-sided alternatives involved. 


3. Some exampies. We shall now consider some examples of special interest 
concerning the use of the relations in Section 2. 
A. First we consider 
( <al+06;0 < 
P, = P (Wit) 
bi+a;1 St « 


Using (4) and (5), we get 


I a 2att 2/2 
P, = ™ [ Le, tw? gl!* de 
» '- vel. ! | , 


= (a + b) — 26°” b(a — b) + "(a — 3b) 


where ®(x) is the normal (cumulative) distribution function with mean zero 
and unit variance, Further, using the transformation (2) in Section 2 we have 
P, = PsX(U) 

{ <(b-—qaU+a;isU « 


+2 


< (a b|U+b;0< Us} 


Then, replacing X(U) by Xy(U) as indicated in Section 2, for F(U*) = 
( 


| 4 < (a — b)F(U) +b; —o~ < U 
P, lim Py .4[Fx(U) — F(U)|VN 


< (b-—a)F(U)+a;U’s U<« 


An expression of the above type for a > b gives greater weight to deviations at 
the extremes of F(U’) than does the ordinary expression with a = b. 
B. Next we consider 


sat+hb,0O <tisl,; 
Pr (W(t) b>O0:a+b>0 


2 


~(at+b);1St<« 


< (a —»b)F(U) +b; —= 


lim Py (FRU) F(U)\~ N 
(a — b)F(U) b: U* < 


where, as before, /(U*) \. We have 


Py ; — : i a 
V 2r +b 


| 26(—a 2 | a 3b)| + ¢ - lb(3a b) — bla 3b)| 


Probabilities of this type can be used, fortexample, in cases when the dispersion 
in F*(U) is of particular interest. 





CONFIDENCE CONTOURS 
Finally we consider 
P; Pr{|W@®)|sat+bsrsts y} 


lim Py{ | Fx(U) — F(U)| VN Ss (a — b)F(U) +b, U' s U Ss UU") 


where F(U’) a/(1 + x) and F(U”) = y/(1 + y). From (7) we get 


l Pi Pe 
P; = j | ls | la, P” 
2rV 1 — RV oy ri 7 Pa a 


| nf ( 8) ) 2Rs, 8o 4 (5 ) } 
“P30 — RD) L\ Vx Vary Vy/ J)’ 


With some reduction we obtain 


] a+] V2 + pe Vu , a . ) 
P; ; ds / Iso exp 4 8; — 2Rs, s+ 8 
2rv | R? Jp in 1 Lrefe dso @Xx] a1 — R) [sj vsy + | 


| “ a Ao Be wai 
22, (-1)""e"" a | Is | 82 Exp < 
rR’ 2 i dss EXP \ 57 — R*) 


2rv | m= 1 JA) 


+ [81 + 2Rs; % + sly, 


where 


Ay (pi + 2axm) 2, By — (po + 2bm) Vy, (py = ax + b, 
As (py 2arm) V/ x, B, = (p. — 2bm) Vy, \ pp» = ay + b. 


This distribution, for a b, has been evaluated by Anderson and Darling [1] 
and Manitya [9] 


4. Power functions. It may be of interest to compare the power of the one- 
sided test for a certain class of hypotheses for which the power of the most power- 
ful test is known. As an example we choose the case where a normal distribution 
with, let us say, mean zero and unit variance is tested against the class of normal 
distributions with the same variance but a negative mean. 

Let &(U’) be the normal cumulative distribution function with mean zero and 
unit variance and ¢(l’) ’’(l’) the corresponding frequency function. Thus 
the counterhypotheses are H(U’) o(lU' + m/+/N), with m > 0. We define, 
for a H(U), 


Ka] = KIM(U)| = (©(U 4 m/VN) — &(U)}AN = me(U) + O1/VN); 
Further, omitting terms of lower order, 


we on Kel an nga... Sita 


64 





STEN MALMQUIST 


We then get (cf. Fig. 2) 


/ 
| Az, 
‘ 


r - K 7 = 
K{z] 2 Alz] \A(1 — 2), 


2me(0) 


: (Be + C C2s }- —mz 
K[z] s K(x) = ; , : : 
[7] K(z| BA -—-a2) +, .S32< 1: C m2z0(z) + m¢(z); 


where —« < z¢ S 0. Now consider the probability 
P(a, b) = P lim Py 


J(Fx(U) ~ F(U)|\VN S (a — b)F(U) + b, — = 
< 


-U < U*: 


\[FX(U) — FUU)\VN Ss (b — a)F(U) + a, U* SU < 2; 


with F(U*) }. According to (8) we have 
P(a, b) = &(a + b) — 2¢ ™&(a b) + 6“ '@(a — 3b). 
If the hypotheses 7(/) is true, we have, omitting terms of lower order 
P = Py = lim Py 


((H*(U) — H(U)IWN Ss (a — »)H(U) +b — KIH(U)|, — 2 < Us UF) 


\H*(U) — HU)IVN Ss (b — @H(U) +a — KIH(U)), U* < U - 
with H(U'*) = 4. Thus 
Py = Pia—- B—-C,b—C) < Py < Pla 1,b) = Pa. 


Now, | — P(a, b) = a@ is the probability of rejecting the hypothesis F when it 
is true. Let Pr{H, a} | Py» be the probability of rejecting F, when H is 
true. We then have, for the power function Pr} H, a}, 


1 — Pa = Pr < PriH,a} < Pr 1— P,. 


On the other hand, from the assumptions concerning /” and H it follows that 
the most powerful testing procedure should be to compute from the observations 





CONFIDENCE CONTOURS 531 


x the normally distributed variable My = >>z,/-~V/N, and reject F if My < \, 
A 
where [ d®(x) = a. 


Atm 
If H is true, then the probability of rejecting F will be Pra.. = | d®(z). 


a 


The following table shows, for some values of a and b and for m = 1 and z = 
-2¢(0), the probability a, the limits Pr and Pr for the power function for the 


test in question, and Pr,,,. , the value of the power function for the most power- 
ful test. 


> 
I r Pr Prax 


0.011 0.044 0.054 0.098 
0.012 0.052 0.094 0.105 
0.029 0.103 0. 169 0.186 


The values in rows 1 and 2 indicate that for the hypothesis in question, we would 
prefer to take a > b. 


5. The joint distribution of the coordinates of the maximum deviation. So 
far, we have considered only one of the coordinates for the maximum deviation. 
We shall now also consider the location on the U-axis of this maximum. For sake 
of simplicity, we deal only with one-sided alternatives. The corresponding two- 
sided alternatives can be treated in the same manner. 

Let K(a, U) denote the probability that the maximum value of the Gaussian 
process X(U/) defined by (2) is found for U* S U and that this maximum value 
is smaller than or equal to a. Thus, K(a, U) is the joint (cumulative) distribution 
function for the coordinates for the maximum of X(U). Further, let the cor- 
responding frequency function be k(a, U), the marginal frequency functions be 
g(a) and h(U), and the conditional ones be gy(a) and ha(U), respectively. 

From (1) we have g(a) = 4ae - 

To evaluate k(a, U), we caleulate the probability, say ?, that the process 
X(U), before reaching the ordinate through the point U, oversteps the line 
xr a, but not the line « = a + da, and that X(U’) does not overstep the line 
x = aafter?. We then have k(a, U) = 6P/é6U. 

Transforming to the W(t!) process, P is equal to the probability that W(t), 
before reaching a vertical line through ¢ = U’/(1 — U), oversteps the line a(t + 1) 
but not the line (a + da)(t + 1) and that W(t) does not pass the line a(t + 1) 
after ¢. 

Using (4) and (5) we then have 


e 2a( R~Z 4 | 28 ar 


—2a( R—-Z 
V 2a‘ ee 
“7 





532 STEN MALMQUIST 


where I a(t + 1). This reduces to 
P = 4ac (a, U’), U’ = (2U — 1)/VU(I — U), 


where ®(z) is the normal (cumulative) distribution function with m 0 and 
oO ¢ Thus 


k(a, U) 4ac *ag(al "\dU'/ dU 


where g(x) denotes the normal frequency function. We have further 


h(U) = 


[ 6P aa 


7; da - | k(a, U) da = 1, 


“0 0 


k(a, U) ie in aU! 2U 
pla) oO! ot hae “aglaU’) —. Pf = 
gu\4) = “N0) or wie JUG 


’ k(a, U) on , 2U - 1 
(U) = —— = ag‘aU’) —; U! = 
haU) g(a) es dtl’ VU 


Finally 


9 


Gy (a) - | gv(a) da = 1 —2(a/z)e “”’” * — 2@{-a/z}, z= VU — U), 
0 


ff 


HU) = | h(U)dU = &(aU’), U’ = (2U — 1)/VU(1 — UV). 


Inspecting these conditional distributions, we see that if the value a increases, 
then the probability also increases that this maximum value will be in the middle 
of the underlying theoretical distribution F(U). Further, if the expressions of 
Kolmogorov-Smirnov are used in their ordinary form for testing a hypothesis 
concerning F(U), one undervalues the importance of deviations between the 
empirical and theoretical distributions at the extremes of F(U) compared with 
those in the middle. An alternative procedure could then be to observe the co 
ordinate U and then use the conditional distribution Gy(a). In this connection. 
it should be kept in mind, however, that the distributions derived are limiting 
distributions. 


REFERENCES 


T. W. ANbERSON AND D. A. Darina, “Asymptotic theory of certain “goodness of 

fit’’ criteria based on stochastic processes,’’ Ann. Math. Stat., Vol. 23° (1952), 
pp. 193-212 

W. BrrnBaum AND F. H. Tinaey, ‘‘One-sided confidence contours for probability 
distribution funetions,’’ Ann. Math, Stat., Vol, 22 (1941), pp. 592-596 

Cramir, Mathematical Methods of Statistics, Almqvist and Wiksells, Uppsala, 
1945, Princeton University Press, 1946 

D. Donsker, ‘‘Justificatio, and extension of Doob’s heuristic approach to the 
Kolmogorov-Smirnov theorems,’’ Ann. Math. sstat., Vol. 23 (1952), pp. 277-281 





CONFIDENCE CONTOURS 533 


[5| J. L. Doos, “Heuristic approach to the Kolmogorov-Smirnov theorems,’’ Ann. Math 
Stat., Vol. 20 (1949), pp. 393-403. 

[6] M. Kac, “On some connections between probability theory and differential and in 
tegral equations,’’ Proceedings of the Second Berkeley Symposium on Mathematical 
Statistics and Probability,’ University of California Press, 1951, pp. 189-215 

N. Koutmocorov, “Sulla determinazione empirica delle leggi di probabilita,’’ 

Giom. Ist. Ital. Attuari, Vol. 4 (1933), pp. 83-91. 

[8] P. Livy, Processus Stochastiques et Mouvement Brownien, Paris, 1948 

{9} G.M. Manrya, ‘“‘Generalisation of the criterion of A. N. Kolmogoroyv,”’ Doklady Akad 
Nauk SSSR, Vol. 69 (1949), pp. 495-497. 

{10} F. J. Massey, Jr., ‘A note on the estimation of a distribution function by confidence 
limits,’” Ann. Math. Stat., Vol. 21 (1950), pp. 116-119. 

{11} F. J. Massey, Jr., ‘“The distribution of the maximum deviation between two sample 
cumulative step functions,’’ Ann. Math. Stat., Vol. 22 (1951), pp. 125-128 

[12] N. V. Smirnov, “On the deviation of the empirical distribution function,’’ Rec. Math. 
N.S. [Mat. Sbornik] 6, Vol. 6 (1939), pp. 3-26. 

[13] J. Wo_row1rz, ‘““Non-parametric statistical inference,’’ Proceedings of the Berkeley 
Symposium on Mathematical Statistics and Probability,’’ University of California 
Press, 1949, pp. 93-113. 





CONFIDENCE BANDS FOR POLYNOMIAL CURVES 


By Pau. G. Hoe. 


University of California, Los Angeles' 


1. Summary. A method is given for constructing confidence bands for poly- 
nomial growth-type curves. The method assumes that the mean population 
size can be expressed as a polynomial in time and that the generalized T func- 
tion for the mean values of the observations at fixed time points possesses a 


known and parameter-free distribution. Independence between observations 
at various times is not assumed. The method yields only a lower bound for the 
confidence coefficient. 


2. Introduction. Consider a random variable y, that represents some meas- 
urable characteristic of an individual from a population, or that represents 
the size of a population, at time ¢, The graph of F(y,) as a function of t, where FE 
denotes expected value, will be called the mean growth curve. 

One of the basic problems in studying growth and related phenomena is that 
of estimating the mean growth curve. In particular, it would be highly desirable 
to be able to construct a confidence band for the mean growth curve. Then the 
experimenter would be able to observe the accuracy of his sample curve as an 
estimate of the mean growth curve. 

A method for getting a confidence band for a mean growth curve should be 
such that, corresponding to a given confidence coefficient Cy, the probability 
is Co that the entire curve will lie inside the band. This means that the prob- 
ability is Co that for all ¢ the mean growth curve ordinate F(y,) will lie between 
the corresponding ordinates of the two curves determining the confidence band. 
In this paper a method is presented for constructing such a confidence band for 
polynomial curves, but it yields a band which covers with probability 2; 
rather than with the exact probability Co . 

Although the method will be described from the point of view of polynomial 
growth curves, the method is applicable to polynomial curves in general. The 
language of growth curves is used for its descriptive convenience and to stress 
the fact that the variables y;, --- , y which are involved are dependent vari- 
ables, and hence that standard regression techniques are not applicable to this 
problem. The variable 4 may represent any physical quantity, although it will 
he described as time in this discussion. 


3. Assumptions. It will be assumed that observations are always made at the 
times 4, , fe, °°: , & and that n > k independent sets of such observations are 
made. If y, represents a measurable characteristic of an individual at time ¢, 


rf 
Received 8/25/52, revised 3/9/54. 


‘Work done under the sponsorship of the Office of Naval Research 


534 





CONFIDENCE BANDS 535 


this implies that n individuals are observed at the same stages of their growth. 


If y, represents the size of a population at time /, this implies that the same initial 
size population is chosen each time and that the same time pattern for observa 
tions is used. It is possible to vary the initial size population, but the statistical 
interpretation then becomes more complicated. Let y; , 2, +++ , yx denote the 
values of y, at the specified time points for a randomly selected individual, or 
initial population, and let j, je, --° , je denote the sample means of those 
variables for the n sets of observations. 

Two basic assumptions will be made. First, it will be assumed that the mean 
growth curve is a polynomial of known degree, k — 1 or less. Second, it will be 
assumed that the distribution of the generalized 7 function for the variables 
Wi, Ye, °** , Ye iS known and does not depend upon any unknown parameters. 

If one is studying growth problems, the first assumption may seem somewhat 
unnatural since exponential functions are often encountered in such problems. 
It is usually possible, however, to approximate such curves quite well over limited 
time ranges by polynomials of fairly low degree. Furthermore, by choosing a 
function of ¢ as the independent variable, or by choosing a convenient function 
of y, as the basic variable and assuming that its mean curve is a polynomial 
curve of degree k — 1 or less, the range of application of the method is extended 
considerably. If, however, a function of y, is used, the interpretation will be in 
terms of the mean of this function rather than in terms of the mean of the vari 
able. The methods to be presented are actually valid for finding confidence bands 
for curves expressible in the form y, = aygi(t) + «+> + age(t), where the g,(/) 
are any given functions of ¢. For such more general curves, however, the formulas 
derived in Section 4 for polynomials are not applicable. 

The second assumption will be satisfied if the variables y;, ye, ->-° , yy are 
jointly normally distributed, because then the distribution of 7° is well known 
1). Even though the variables y,, ye, °°* , ye are not jointly normally dis 
tributed, the second assumption may still be considered to be satisfied if n is 
fairly large, because it can be shown that under mild restrictions 7° possesses 
an asymptotic chi square distribution [2]. 


4. Derivation. Let the mean growth curve be a polynomial curve of degree 
r — 1s k — 1 and let the ordinates on this curve for the times f; , fo, «++ , & 
be denoted by ee ae If the first r of the points (t; , par), (to, pro), °° 
(t , wx) are chosen to determine the curve, its equation can be written in the 
Lagrange polynomial form 


r 


(4— t) +--+ ( — toad(t — ty) --- (6 — 4) 
(1) i > 1) ») t+) Bs 
rea (tl; — &) +++ (te — bead)(t, — bisa) --- (4 — 8) 
The coordinates of the remaining k — r points must of course satisfy equation (1). 
Now introduce the variables x, , a, --+ , 2, defined by 
(t ty) «++ (ft - sd vee (tt — t,) 
(tj — t) --- Ct ¥ - (l i 


(2) = 





536 PAUL G. HOEL 


Then (1) may be written in the form 
(3) Y ity + pole + °° + order. 


In the coordinate system (x, , «++ , 2, , y), equation (3) represents an r-parameter 
(41, °°* , #r) family of planes passing through the origin. The method to be pre- 
sented for constructing a confidence band for (1) is based on finding the envelope 
of this family, subject to a single restriction on the parameters. This method is 
a generalization of a similar method used by Hotelling and Working [3] to obtain 
a confidence band for a line of regression. An extension of their method to more 
general problems is given in [4]: 

The restriction that will be placed on the parameters yw; , --° , wu, is obtained 
by means of the generalized 7’ function. For the variables y,, yo,--- , y 
Hotelling’s generalized 7' is defined by 


. k 


(4) T’ = (n — 1) 7 - 8 (a, — wd (9; M;), 


tel jeu] 


. . 
where (s and where s,; is the sample covariance, 


l< 5 
8; = 3 (Yia -_ Gi) (Via = Yj). 
7” 1 


am 


Under the assumptions made in the preceding section, a value of 7° can be found, 
which will be denoted by 79 , such that 


(5) P\T s To} = Co, 


where (Cy is a given number satisfying 0 < Cy < 1. The number (5 will be the 
the lower bound for the confidence coefficient corresponding to the confidence 
band to be constructed. In terms of the preceding notation, the restriction that 
will be placed on the parameters of (3) is the restriction 

(6) "2 7%. 

From the remark made after (1), it follows that the parameters y,,, 
wu. can be expressed as linear combinations of uw, , --- , w, and that therefore re 
striction (6) can be expressed as a restriction on yw; ,°** , , only. 

Now the technique for finding the envelope of an r-parameter family of sur 
faces such as (3), subject to a single restriction on those parameters such as 
= To, consists in first using the restriction to express (3) as an (r 1) 
parameter family of surfaces, and then eliminating those parameters between 
(3) and the r 1 equations obtained by differentiating (3) with respect to those 


r — | parameters. But analytically this technique is equivaient to that em 


ployed in finding the maximum and minimum of the function y = wiry + °° 4 
ux, for fixed x’s when the y’s are subject to restriction 6). The analysis will be 
carried out from the latter point of view with the aid of matrix algebra. 

Let (4) for T Ty be written in the form 


k oh 
DL aja: — Glu; — H) =, 
. s 





CONFIDENCE BANDS 537 


1 rp . ° 
where a,; = 8’? and Xy = To/(n — 1). If the parentheses are removed this equa 
tion assumes the form 


k ok k 

(7) » 2 umn — 2 Do aim + Dp = Do. 

Let a denote the vector of a,’s, 7 the vector of §,’s, and A the matrix of a,,’s 
It will be seen that 

(8) = Aj, 

and that A, is the quadratic form in 9, , «++ , J, given by 

(9) h = 7A jf. 


Denoting 2x,(t;) by 2,;, it follows from (1) that wj = wai; + «++ 4+ y,r,; for 
j =r+i,--+,k. By means of these relations, (7) can be reduced to an equa- 
tion in the parameters i,.°°* > only. This reduction can be accomplished hy 
means of the transformation 


(10) bb By, 
where p,; uw, fori = 1,--- ,r, and where B is the k x k matrix 


Ws es —- 9 


The reduction of (7) by means of (10) then proceeds as follows: 
pw’ Ap — 2a'p + Ay = No 

(12) v B’ABv — 2a’Bv + \, = Xo 
v’'Cv — 2c’v + Ay = No 

where 

(13) C = BAB and c = B'a, 


Since the last k — r columns of B consist of zero elements, the matrix C will 
contain zero elements in its last k — r rows and columus, and the column vector 
c will have zeros for its last k — r components. In summation notation, (12) will 


therefore assume the form 


r 


(14) 2 Dy Cites ~ 22, eis A & De, 
l 1 


since »; = yw, fort = l,--: 





538 PAUL G. HOEL 


It will be convenient to express (14) in the form 


(15) > Dd eilui — a) (uy — aj) = ro. 
1 1 


Expanding (15) and comparing with (14), it is readily observed that ¢ Ca, 
where a denotes the vector of a;’s, or that 


(16) a= C 'c. 

Here C denotes the r x r matrix consisting of the first r rows and columns of (, 
and ¢ denotes the r-dimensional column vector consisting of the first r com- 
ponents of c. It is assumed that C' exists. This assumption will be satisfied if 
(s;;) is nonsingular, and the latter condition will be satisfied with probability 
one. The comparison of (14) and (15) also shows that A» = Apo Ai + a’Ca, 
which because of (9) and (16) can be written in the form 


(17) ho = do — #/Ag + ¢'C 'e. 


Now return to the problem of maximizing the function y = wyay, + «+ * + up, 
for fixed x’s subject to the restriction (6), which, because of the preceding analy- 
sis leading to (15), is equivalent to the restriction 


(18) > Sd cis(ui - a) (pj; — a;) & As. 
: . 


In the parameter space yw; , °-- , wu, , the function y is the scalar product of the 
two vectors (7, , °°: , 2,) and (w,, °°: , w-), Whereas the restriction (18) states 
that the terminus of the vector (u;, ++ , #,-) must lie inside or on the ellipsoid 
whose equation is given by (15). The scalar product is conveniently interpreted 
here as the length of the x vector multiplied by the projected length of the u 
vector projected onto the x vector, together with the proper sign. The vector 
p» Whose terminus must lie inside or on the ellipsoid (15) and whit . has maximum 
x-directed projected length is a vector whose terminus is a point on the ellipsoid 
where the tangent plane to the ellipsoid is perpendicular to the x vector. There 
will be two points of this type, one yielding a maximum and the other a minimum. 
The coordinates of these two points can be found as follows. 
Let the equation of the ellipsoid (15) be written in the form / = 0, where 


P= YD eisui — a) (uj — a) — do. 
1 1 
Direction numbers for the normal to the tangent plane of the ellipsoid are given 
by the derivatives 
F,, E 2[en(ur - a) + °° + Cirl(ur — az), 7 # ft. 


If the tangent plane is to be perpendicular to the 2 vector, these direction num 
bers must be proportional to the components of the x vector; hence it is neces 





CONFIDENCE BANDS 039 
sary that F,, = 2 pox; , where 2 po is the constant of proportionality. In matrix 
notation this becomes C(u — a) = pox. Hence 

1 
(19) up—a=pl' zx. 


The constant pp is determined by realizing that y;, --- , 4, must satisfy equa- 
tion (15); hence 


(pC 'x)'C(poO'r) = ro. 
Solving for po yields 
(20) Pp = + Vr2/2'C-z. 


The maximizing vector is the vector given by (19) when the positive value in 
(20) is selected. The maximum value of the function y = u'r is therefore given by 


Ynax = (a + poC'x)’ xe=axrt por C vw’ =arc V/ ro V2x'C “ly, 


Finally, if the values obtained in (16) and (17) are substituted, this will become 


(21) Ynax = € Cn + Sr — 9'AG + c'C'e W202. 


The minimum value of y is obtained by using the negative value in (20); hence 


Yin = CC 2 — Vro — §'AG + Ce V2'C2. 

5. Interpretation. Since, by (2), the vector x has components that are poly- 
nomials in t, equations (22) and (21) define two curves in the ¢, y plane such 
that the curve (1) will lie between these two curves if restriction (6) is satisfied 
From (5) the probability is at least Cy that the mean growth curve (1) will lie 
between the curves (22) and (21). The latter probability would be exactly Co 
if only parameter points satisfying (6) could yield mean growth curves lying 
between curves (22) and (21); unfortunately, however, when r 2 3 there are 
parameter points not satisfying (6) which yield such curves. As a result, Co is 
only a lower bound for the confidence coefficient corresponding to the confidence 
band determined by (22) and (21). 

The difficulty encountered in the preceding paragraph is best explained by 
considering a special case. Assume that k = 4,r = 3, and that the ellipsoid (15) 
is a sphere of radius } with center at the origin. Now it follows from the defini 
tion of the z; in (2) that Dizi = 1; hence the x vector will have its terminus 
lying in the plane ym = 1. As it varies, the terminus will describe a curve in 
this plane that is easily seen to be a parabola with vertex at the point (0, 1, 0) 
and passing through the points (1, 0, 0) and (0, 0, 1). 

As the x vector describes the parabola, its intersection with the sphere will 
describe a curve on the sphere. This curve will be the locus of the maximizing 
points on the sphere, because the tangent plane at any point on this curve will 
be perpendicular to the x vector through that point. The curve on the sphere 
which is symmetrically opposite this curve will be the locus of the minimizing 
points. Now as ¢ varies over its range of values, the tangent plane along the 





540 PAUL G. HOEL 


maximizing curve and the tangent plane along the minimizing curve will generate 
a closed surface. Every point inside this surface will yield a value of the function 
Y = pit + pete + pst; that lies between min ANd Ymax for all values of ft. 

The closed surface here will resemble, very roughly, the surface formed by two 
right circular cylinders, of diameters equal to the diameter of the sphere, which 
intersect at right angles. The actual confidence coefficient here would appear to 
be not appreciably larger than Cy , since most of the parameter points inside this 
surface will also lie inside the sphere. For more complicated situations, however, 
the relationship between Cy and the actual confidence coefficient is unknown. 

A minor difficulty with the method is that for some sample points the relation- 
ship between the y’s, given by wy = witiy + ++: + ates GG = rt+1,---,h), 
will be inconsistent with restriction (6). Geometrically, this means that the 
random ellipsoid determined by (6) does not intersect all the planes determined 
by this relationship. For such sample points, the radical in (22) and (21) will 
be imaginary. The probability that this event will occur is undoubtedly quite 
small for most applications. As an illustration that is unrealistic but simple to 
compute, if n is large, k = 5, r = 4, and (6) is assumed to be a spherical restric- 
tion, it can be shown that the probability is less than .001 that inconsistency 
will occur when Cy = .95. For larger values of Cy , such as Cy = .99, the prob- 
ability is extremely small that inconsistency will occur. Since nonintersection 
can occur only when the parameter point does not satisfy (6), the lower bound 
Cy for the confidence coefficient still applies, provided one interprets imaginary 
confidence bands as bands incapable of covering any growth curve. If one ex- 


cludes the nonintersection cases, the conditional probability of covering the 
mean growth curve will be slightly larger than the unconditional probability. 

The choice of the first r points to determine the curve (1) was arbitrary. In- 
vestigations have not been made on how best to choose the points so that com- 
putations become simple, nor on how best to utilize the data. The problem of 
constructing a confidence band with known confidence coefficient by the method 
of this paper appears to be very difficult, if it is at all possible. 


6. Illustration. The calculations involved in using formulas (21) and (22) 
will be illustrated by a simple example. Consider the problem of finding a 90 per 
cent. confidence band (lower bound) for a parabola when ten individuals ob- 
served at each of four equally spaced time points yield t; = 0, 1, 2,3; 9; = 5.0, 
5.4, 6.0, 6.9; and 





CONFIDENCE BANDS 


Fig 1. 


Here k = 4,r = 3,n = 10, and C, = .90. Using (2), 11 = 4(° — 3t + 2) "x, = 
— ( + 2t, 2; = 4(? — t). For t = 3 these give zy = 1, 24 = —3, ty = 3. 
These values enable one to write down B in (11). The inverse of '(s;;) yields 


1.426 .333 — .051 554 
.333 . 288 . 149 .274 
— .051 149 1.357 593 


.554 —.274 593 1.683 


From (13) and (18) it follows that c = B’A gj. First B’A is computed, then 
c = B'Ag and C = B’AB are computed. Next C™ is computed, and then 
e’'C' and ¢’C™’c. Finally, 9A @ is computed. For this illustration, computations 
yielded the values 


A= 


c’ = (28.906, —41.760, 62.167) 
(5.0099, 5.3875, 6.0130) 
= 293.644 7Ag = 293.673. 


cc" 
ic 1. 


¢ 





542 PAUL G. HOEL 


The value of 4x, = 7'/(n — 1) can be found by means of tables of the F distri- 
bution, using the relation 75 = Fy(n — 1)k/(n — k), where », = k and » = 
n — k, or by means of tables of the incomplete beta function, or by numerically 
solving the proper equation (see [1]). For this illustration A» will be found to 
have the value \y = 2.121. With the above computations completed, equations 
(21) and (22) can be written down in terms of the z’s. If the z’s are replaced by 
their expressions in terms of ¢, (21) and (22) reduce to 


y = 1240 + 254t + 5.010 





+ VW(.174 # — 1.067 @ + 2.109 @ — 1.469 t + .640)2.092. 


The graphs of these two curves, together with the values of the 9; , are shown 
in Figure 1. 

If the equations of the two curves determining the confidence band are not 
needed, the graphs can be constructed much faster by using equations (21) and 
(22) expressed in terms of the x’s, rather than in terms of ¢, and calculating the 
x’s corresponding to convenient ¢ values. When t = t; andj = 1, --- , r, it follows 
from (2) that 2x,(t;) 6;, and hence that 2’C™'z reduces to the element in the 
jth row and jth column of C”'. The quantity ¢’C™'x then reduces to the jth com- 


ponent of the row vector ¢’C’’. Although the computations are more difficult 


for ) > r, they are still simple. 


REFERENCES 

1] H. Cramér, Mathematical Methods of Statistics, Princeton University Press, 1946, pp. 
407-409 

{2} H. B. MANN anpb A. Wa tp, “On stochastic limit and order relationships,’?’ Ann. Math. 
Stat., Vol. 14 (1943), p. 224. 

(3) H. Hore..ine ano H. Workina, “‘Application of the theory of error to the interpreta- 
tion of trends,’’ J. Amer. Stat. Assoc., Suppl., Vol. 24 (1929), pp. 73-85. 

[4] P.G. Host, ‘Confidence regions for linear regression,’’ Proceedings of the Second Berke- 
ley Symposium on Mathematical Statistics and Probability, University of Cali 


fornia Press, 1951, pp. 75-81. 





RANDOM FUNCTIONS SATISFYING CERTAIN LINEAR RELATIONS 
By 8. G. GuurYE 
University of North Carolina! 


Introduction and Summary. A hypothesis often made about a sequence of 
real-valued r.v. (random variables), {X,,n = 0, +1, +2, ---} is that there 
exist certain real constants a; , a, --- , a, such that if we write 


(1) i = Be + aX n. 1 + ac + aX -k y 


then {Y,} is a sequence of independent r.v. Now, very often in practice, the 
observed sequence {|X,} consists in observations made at equidistant ¢-points 
on a stochastic process with a continuous parameter ¢. Restricting our attention 
to the case k = 1, we then have a r.f. (random function) X(t), defined for all ¢ 
in an interval, with the following property: There exist a value of ¢ (t , say), 
and h > 0, and a real number a@ such that a hypothesis of the type mentioned 
is satisfied by the sequence {Y(t ,h;n),n = 0, +1, +2, ---}, where 


(2) Y(lo,h;n) = X(to + nh) — aX(lo + [n — 1]h). 


But if such a hypothesis is true for one value of h, it is not necessarily so for 
some other value; and we have to make the additional assumption that the 
particular length of the ¢-intervals with which we are concerned is precisely 
the one for which the hypothesis holds. This assumption may not be reasonable 
in every case, Instead we may wish to work with a hypothesis similar to the 
above, but which holds for all positive h in some interval. 

In this paper, we investigate the existence and form of random functions 
satisfying a hypothesis of this type. Section 1 contains a statement of the problem 
and some simple results. It turns out that any random function possessing the 
required property can be expressed as the product of an exponential function 
of t and a rf. with independent increments. Section 2 deals with the limit in 
distribution of a sequence of Stieltjes approximating sums involving a r.f. 
with independent increments. Finally, in Section 3, these results are applied to 
the problem under investigation, and further possibilities are investigated. 

It must be noted that, in this investigation, we are concerned only with 
results in distribution. That is to say, we are dealing throughout only with 
parametric families of probability laws and talking in terms of r.f. merely for 
convenience. 


1. The problem and some preliminary results. We consider a real-valued 
random vector function X(t) whose transpose X‘(/) is the row vector 


x’, , X’”’(t)} defined and continuous in probability for all 4 2 some ty . 


Received 7/24/53 
Work done under the sponsorship of the Office of Naval Research 


543 





544 8S. G. GHURYE 


We suppose that there exists a real-valued, p x p matric function A(h) = {a’’(h)} 
defined and continuous for h = 0, and such that if we write 


(3) Y(n;h) = X(to + nh) — A(h)X[lo + (n — 1)hl, 


then for any h > 0 and any integer N, X(t), Y(1;h), --- , Y(N; A) are mutually 
independent. 

We shall be concerned with the existence of such random functions X(t) and 
with the functional form of A(h), the probability law of X(t), and other ques- 
tions of this sort. To start with, we shall prove 

Lemma 1. Let X’ = {X, ---, X¥™)} and Y’ = {Y, ---, ¥'™} be inde- 
pendent random vectors, and let there exist an m x n matrix A of rank r such that 
X + AY is independent of Y. Then there exist at least r linearly independent 
n-vectors c(j) such that, with probability 1, c’(j)Y is a constant, 7 = 1, 2, --- 

Proor. Let 


gas 


t’ nee {¢P 


(m) (1) (n) . i(t’X+u’¥) 
. yet, wml fe eer yan dh f(t, u) = Efe i 


be 
Then the independence of X and Y implies 
f(t, u) = fit, Of, u), f(t, A’t + u) = f(t, OO, A’t + u). 
From the independence of X + AY and Y, we have 
f(t, A‘t + u) = fit, ADO, u) = f(t, O)fO, ASO, u). 
Since f(0, 0) = 1 and f(t, u) is continuous, there exists a region, t/t < 7”, in 
which f(t, 0) # 0. Consequently, for any ¢ in this region and any u, we have 


fO, A't + u) = f0, AS(, u). 


Since A’ is n x m and of rank r, there exist nonsingular matrices B and C 
such that 
: 0 
---- | BY, 
: 0 


Bt = v, u = Cw, y= fy, --- 9 0, --- , OF, 
a  « ,(1) (r) ’ We a - 
D = jw’,°-,w ,0,°---, 0}, {(, C’s) = g(d). 
Then, for any 0 in a certain neighbourhood of the origin and for any @, 


g(6 + B) = g(d)g(®) and g(s) = e'”’+#"" 


Since g is a characteristic function, a «= 0 and f(0, C’*) = e®’’. It follows that 
if c(j) be the jth column of C’, then 


Pr{e’(j)¥ = 8”) = 1, 


Corouuary 1. If Y is not linearly singular, in the sense that there is no c'Y which 
is a constant with probability 1, if X + AY is independent of Y, and X + BY is 
independent of Y, then A = B. 

THrorem 1. Let X(t) and A(h) be as assumed at the start of the section, with 





RANDOM FUNCTIONS 545 


the additional conditions that X (to) is not linearly singular and A(h) is nonsingular 
for some h > 0. 

Then A(h) can be written in the form e“, where A is a constant matrix. Further 
X*(t) = e“'X(t) is an additive process. Conversely, if A is any constant matrix 
and X*(t) an additive process, X(t) = e“'X*(t) is of the desired type with A(h) = e™. 

Proor. For any h > 0 and any m, X(t), Y(1; hm™"), --- , Yj; hm™), ---, 
Y(m; hm") are independent. Therefore xe {A(hm™)}?¥(m — j; hm™) is 
independent of X(t); that is, X(t + h) — {A(hm™")}"X(t) is independent of 
X(to). Hence by Corollary 1, {A(hm™)}" = A(h). It follows, by the usual 
continuity argument, that A(h;)A (he) = A(hy + hie). 

Let h; be such that A(h;) is nonsingular, and let h, — 0. Then lim,.o A(h) = 
A(0) is the identify matrix 7. Nagy [11] has shown that, under these conditions. 
there exists a constant matrix A such that A(h) = e” and 


A = limyio {A(h) — Ijh™. 
Now lett <4 <th th S th < to + ke, and 4 — & = h. Then X(t), 
Y(j;hn"),j7 = 1,2, +++, n,n + 1, --- , are independent r.v. Hence 
X(t, + mhn™) — A(mbhn™)X(t), 
X(t + mshn™) — A{(ms — m)hn™"} X(t, + mhn™), 


are mutually independent if m S mz < m;. By letting n, m , m2, m;— ~ 


in a suitable manner, we can show that the r.v. (4) converge in distribution 
respectively to 


(4) X (to) 


(5) X (to), X(t; + hy) —- A(iuy)X (th), X (te + he) — A (he) X (te). 


Hence, these are mutually independent. In the same manner, we see that for 
any integer k and any t,, t2, +--+ , te in [to, t], X(t + h) — A(h)X(t) is independ 
ent of X(t), ---, X(i), so that X*(t + h) — X*(t) is independent of 
X*(t;), --- , X*(). In other words, X*(¢) is a random function with independent 
increments which are also independent of X*(t). For convenience, we shall use 
Paul Lévy’s terminology and call X*(¢) an additive process. 

The final, converse statement of the theorem is obvious. 


2. Concerning additive processes. The general form of the l.c.f. (logarithm 
of the characteristic function) of an additive process Z(t) which is continuous 
in probability has been derived by Paul Lévy [7] and [8] (for alternate deriva- 
tions and forms, see Doob [1], Feller [2], Gnedenko [4] and Khintchine [6]). It 
may be expressed as follows. 

Let uw’ = fu’, --- , u'”} be a vector variable, and ¢, , 42 be any real numbers 
(4, < t). Then 


[log Big ettonets)} = W(u; te) — w(u; th), 
(7) J(u; t) = iv’ p(t) — du’ E(u 
‘ 4 


+ | fe“ 7 — 1 — w'2(1 + ax)" }Q { x’ x)(x' x) de G(x; 0), 
R(p) 





546 8. G. GHURYE 


when R(p) is the whole p-dimensional Euclidean space of x. Also, both 
w(t) = {w’(t), --- ,w”’(t)} and 


o (ft) --- o * (t) 
Z(t) . eee = >’(/) 
© «ws « 
are continuous, and for every ¢ and any h > 0, X(t + h) — Z(t) is nonnegative 
definite. Further, G(x; 1) is a funetion of the p-vector x and the real variable 
t such that 


{ (i) for Ar O37 , 2, --+, p, the mixed difference A,G(xr; 1) 2 0 


for every (, 
\ (ii) d,G(a,; 0) < « 
(8) 4 Rip) 


(iii) for any ¢, G(x; ¢) assigns zero measure to the point a 0 of R(p), 


’ 


| 


(iv) G is continuous in ¢ for all a, 
(v) if < hand Ac” 20,3 = 1,2,:---, p, then [A,G(x; |e > 0. 


’ 


LEMMA 2. Let X(t) to “(t)} bea px p symmetric matric function of 1, defined 
and bounded for all t in a closed interval T and such that X(t.) Y(t) is non 
negative definite for any ty < t, (in T). Then all elements of S(t) are of b.v.(bounded 
variation) in T. 

Hence, if u(t) is of b.v., so is ¥(u; t) of b.v. in ¢ for every u. 

Noration. In what follows, A(t) = {a'’(t)} being a p x p matrix, we write 
M:dA(v)u(v) for the column-vector whose ith element is 


p t 
> | uv) da‘(v), 
lel Sto 


t 
Similarly we write / A(v) dd(v)A’(v) for the matrix whose (7, /)th element is 
te 


1 m1 t 


P p t 
> 2 | a (v)a?(v) do’ (v). 
1 ° 


By a partition I(t, t; ; 1; m) of [t, t:], we mean a finite number of ¢-values 


to = timo < lima << °° * < bijmnigm) = ti, tim i,j = 1,2,--- , nl; m), 
where tim j-1 S&S ie Mle Bic ;. The norm of the partition is 6(/; m) 
max; (ti,m.j — tt,m.j-1)- 

Turorem 2. Let Z(v) be a p-dimensional additive process, defined and continuous 
in probability for t = to, whose |.c.f£. is given by (7). Let A(t) be a real-valued 
p x p matrix whose elements are continuous and of b.v. in the closed interval {ty , T| 
Given any positive integer N and any values t, < th < «++ < ty of tim (ty, T), 
let {.} = {[U(&,t:;1;m); = 1,2, --- ,N; = 2,3, ---} be a sequence 





RANDOM FUNCTIONS 


of sets of partitions of the set of intervals {[to , ti], l = 1, 2, --- , N} whose norms 
6(1;m) — 0 forl = 1,2, --- ,N,asm-— o. For any m, let 


n( lim) 


(9) S(to, ti; l; m) = >> A (tims) {Z(tim.i) — Z (tt m,j—1) } 


jal 
Then, asm — @, 
) the sequence of sets or r.v. | S(to, ti ;l;m),l = 1 
converges in distribution to a set of r.v. 


(10) Z(A; to, tr), 


whose distribution is independent of \Tn}; 


(b) the lef. of Z(A; to, t) is 


t \ al 
(t,t) = iu’< [A(v)p)]i, — / dA (v)u(v) > — gw | A(v) d3(v) A’ (v) u 
\ to J t 


0 


+ | ig — 1 — iw A(v)x(1 + 2’ 2x)" } 
R(p) X(to,t) 


(1 + a’ x)(2’ a dG(a; 0). 
(c) the joint distribution of the set of r.v. (10) is the same as that of the set 


l 
(12) Dd Z(A; tyr, ty), 


i=! 


where the Z(A; tj1, tj), gj = 1,2, +--+ , N, are mutually independent and the \.c.f. 
of Z(A; tj-1 ; tj) 18 V(tj-41, t,;). 

Proor. First taking N = 1, we show that the sequence of approximating 
sums (9) converges in distribution to a vector r.v. whose l.c.f. is W(to , t). For 
simplicity, we shall drop the suffix m from the t-values, and also write f(u, x; v) 
for the integrand in the third term of W(t) , t;), T for the interval [t) , t;] and 7’, 
for the interval [t,,;1., t:,). 

Since the Z(4,,;) — Z(t,j;),j7 = 1,2, --- , n(l; m), are mutually independent, 


log Efe’ *(t), t.; 1; m)} 


=> yi A’(ty Ju us iq —wfA'(t; just; if] 


=1 


1 > u ‘A(t ) {u(t.,) — pty j—1)} 


? 


n 


wA(t ){2(ts) —B(hja)JA(H su 
j=l 


L > | flu, Zz; i) d,|G)x; ly le G(a; hj 1) } 
j=l “R(p) 


= I,(m) + In(m) + J;(m), say. 


1 
I 
ro 





548 Ss. G. GHURYE 


It is easily seen that, given any positive U, J,(m) converges to the first term 
of V(t , t:), the convergence being uniform for all u such that u’u < U. Similarly, 
I;(m) converges uniformly to the second term of V(t , t;). Finally, 


I;(m) = > [. a“ flu, 2; t,)) dee G(x; v). 
R(p v% 


j=l 


Now, let S, be the set of all z such that 2’x S c, and let S, = R(p) — S.. For 
v € T, the elements of A(v) are bounded in absolute value, by K say. Hence, 
for u’u S U,xe S, and ve T, we have 


| f(u, 30) | = |e*4°* — 1 — wWA(vje(L + ax) "|| (1 + 2’x)(2’z)" | 
< 21+ ¢°) + pKu'c$ 

5 c = c = max (1, p’K’U). 

From (8), we know that i dyG(x; v) < e for > c,.. Let us take 


f 


c = max (cy), c,) and write S for S,. Then 


Df. Feu 25 th) | dev Gla; 0) < 5e 
8’XT; 


j=l 


and / | f(u, x; v) | dayG(a; v) < 5e. Hence, 
8’X<T 


I;(m) — [. nee a; 0) dz, G(x; v) 


n 


< > | | f(u, x; ty.) —f(u, x30 | dey G(x; v) + 10e. 
SXTj 


j=l 


From considerations of uniform continuity, it can be seen that the first term 
on the right-hand side is of order ¢ for sufficiently small 6(1; m) and all u such 
that u’u s U. Consequently, J;(m) converges uniformly, for all uw such that 
u'u S U, to the third term of W(t, t:). By the continuity theorem for char- 
acteristic functions, it follows that { S(t, t,, 1; m),m = 1, 2, ---} converges 
in distribution to a random vector Z(A; to , t;) whose l.c.f. is W(to , t:) and hence 
independent of the sequence of partitions. 

To prove (c), it is enough to indicate the proof for N = 2. Let t2, be the 
largest tom; St,j = 1,2, +--+ ,n(2;m). Then 


(15) S(to » le . a: m) = S(t, tn . 2: m) + S(tEm » te : Be m). 


The two terms on the right-hand side of (15) are mutually independent vector 
r.v. which, on account of (14), converge in distribution respectively to Z(A; to , t:) 
and Z(A; t,, t). 

Now, let II*(t , t: ; m) be the partition obtained by superposing II (ty , t; ; 1; m) 
on II(to , t2m ; 2; m) and let S*(t&, t; ; m) be the corresponding approximating 
sum. By making use of the fact that A(t) is of b.v. and that Z(t) is uniformly 





RANDOM FUNCTIONS 549 


continuous in probability in every closed interval in which it is pointwise con- 
tinuous, we can show that 


S*(to, i; m) — S(to, th ; 1; m) and S*(to, t: ; m) — S(to, tam ; 2; m) 
both converge in probability to zero. Hence so does 
S(to, t: ;1;m) — S(to, t2m ; 2; m). 

But if X,, Y, are mutually independent and converge in distribution to 
X, Y, and 0,, — 0 in probability, then {X,, + 0,, Xn + Y.,} converges in 
distribution to (X, X + Y), X and Y being mutually independent. Therefore, 
the set of r.v. {S(To, t:; 1; m), S(to, te ; 2; m)} converges in distribution to 
{Z(A; to, ti), Z(A; to, 4h) + Z(A; tL, te)}, where Z(A; to, t:) and Z(A; t,, tr) 
are independent. 

teMaRkKs. The fact that A(t) is of b.v. has been used in the proof only in 
establishing the convergence of the first term in (13). Therefore, if u(t) happens 
to be itself of b.v., the theorem certainly holds for any continuous A(t). 

Incidentally, we notice that if p = 1, A(t) = t and G(a; t) = G,(2)G,(0), 
where G,(t) is a distribution function on R(1) with a finite second moment, 
and if u(t) and X(t) are such that the first two terms of (11) have finite limits 
as fy — — © andt— «, then the limit of the l.c.f. (11) is a g-function of Kallian- 
pur and Robbins [5]. 

The definition of Z(A; to, t) as the limit-in-distribution of a sequence of ap- 
proximating sums suggests the formal equation 


t 
(16) Z(A;b,0 = | A(v) dZ(v). 
Dw to 


The D before the integral sign is a reminder of the fact that this is an “integral- 


t2 
in-distribution.’’ We note that for tp < t; < t. , the distribution of | A(v) dZ(v) 


D 


to 
2 


ty t 
is the convolution of the distributions of i A(v) dZ(v) and | A(v) dZ(v). 


DY to Dv ty 
In other words, for fixed t) and variable t = t, the random function 


t 
A(v) dZ(v) is an additive process. Its l.c.f. W(t, t) should, therefore, be 
D4 to 

capable of being displayed in the general form (7). This was actually found 
to be possible with some restrictions on A(t); how it can be done in general is 
not known. Anyway, the resulting expression seems too cumbrous for use. 

It may also be noted that the formal equation (16) actually represents the 
true relation between the l.c.f.’s of its two members. The increments of Z(t) 
are mutually independent and the l.c.f. of A(v) dZ(v) is [dap{A’(v)u; t}]umy.- 
Thus we expect, by an extension of the law concerning the 1.c.f. of a convolution 
of finite order, that the |.c.f. of the right-hand side of (16) is 


| ld, W{A’(v)u; th].., 
tesvst 


if at all such a thing exists. Actually, this is the same as W(fy , /) in (11). 





550 S. G. GHURYE 


In allowing Z(t) the freedom of the wide class of additive processes, we have 
restricted ourselves necessarily to a rather weak limiting process for the definition 
of the integral, namely the limit-in-distribution. Stronger definitions of the 
integral have already been in use for some time. For instance, if Z(t) has finite 
covariance and orthogonal increments, the integral defined as a limit in the 
mean is a random variable which with probability 1 is uniquely determined. 
Since there are additive processes which do not have a finite covariance, this 
definition of the integral does not suit our purpose. 

Now, let A,(t) and A.,(t) be continuous and of b.v. in an interval [t), 7); 
then so is A,(t)A,(t), and hence 


t 
Z(As Ax; to, t) = | Ag(v) Ax(v) dZ(v), 


DY to 


exists and, for fixed to and variable t, is an additive process. So also 


t 
She ak as / Aalv) dy Z(Ax ; fo, ») 
Dv to 

is an additive process. 

Noration. For X(t) and Y(t), two random functions defined for all ¢ in a 

. rry . >e» & > . . 

dominant 7’, we shall write X(t) = Y(t) to imply that, oo any n and any 
t, , to, --+,t, in T, the joint distribution of X(t), --- Xx (t,.) is the same as that 
of Y(t), --- , Y(t,). 

THroreoM 3. If A,(t) and A,(t) are continuous and of b.v. in [to , T|, then 


Z(As, Arj to, t) = Z(Ag Ai; bo, 2) 
for all t in {ty , T). 
Proor. Given i<t <r < t, , let Y(j) = Z(A2A, ; ty , t j- Z(. 4. A,; bo, t; 1), 
and let y,;(u) be the lef. of Z(A2A;; to, t;), so that the lef. of Y(j) is 
Wj(u) — pya(u). Then 


( n a) 
Lc.f. {Z(A2 Ai; to, b,j = 1,°++ , n} = log Bexp| iow Ale to, i) |} 


j=1 


= log E< exp | #2 Z ul(k)Y wl} 


j=l km j 

Biri fate les Afi i cifedtes 45) 
= Do vido wk) — Do vind Do ulk)?. 

j=l hej ) j=l bm J ) 
In the case of Z(A,, A; ; lo, t), we have a similar expression in terms of the 
lef. of Z(Ay, Ai ; to, tj). Hence, it is enough to show that, for any ¢, the l.c.f.’s 
of Z(A,A, ; lo , t) and Z(Az, A; ; to, t) are the same. For this, we can derive the 
lef. of Z(Az, Ay ; ly, t) by the procedure used in Theorem 2, and then reduce 
it to the form (11) with A(v) = Ao(v)A,(v), which is the l.e.f. of Z(AcA, ; to , t). 
Hence the distributions of Z(A., A;; t&, t) are the same as those of 

Z(AsA, ; tot); that is to say, the formal equation 





RANDOM FUNCTIONS 


t 


[ A.) d, I A,(u) dZ(u) = | A,(v) Ay(v) dZ(v) 
pit 


Dv ty D# to 0 


is rigorously true. 

COROLLARY 2. Given any Z(t) and A(t) satisfying the conditions of Theorem 2 
and the additional condition that the determinant | A(t) | be bounded away from 
zero in to , T'), there exists an additive process Z*(t) such that 


t 
Z(t) — Z(t) 2 | Av) dZ*(v) 


D4 to 
In fact, Z*(t) = Z(A™: to, t). 

Proor. The result follows immediately from the fact that, since the deter- 
minant and minor determinants of A(t) are of b.v. and | A(¢) | is bounded away 
from zero, A~‘(t) is of b.v. 


3. Exponential variation. Going back to the X(t) with which we started, we 
have the following result. 
Tueorem 4. If X(t) and A(h) satisfy the assumptions of Theorem 1, then there 
exist an additive process Z(t) and a constant matrix A such that 
t 
(17) X(t) = AX (4) + eA dZv), 7. 
D# ta 
Conversely, if A is any constant matrix, Z(t) any additive process, and X(t) a 
r.v. independent of all Z(t + h) — Z(t) fort = ty and h > O, then an X(t) satis- 
fying (17) has the property that X(t + h) — e'*X(t) is independent of all X(t’) 
fort st’ st. 
Proor. From Theorem 1, we know that X(t) = e“‘X*(t), where X*(t) is an 
additive process; and since e “‘ has the properties of the A(t) of Corollary 2, 
we have 


t 
X*(t) — X*(t) = | e*” dZ(v). 
Do to 


This immediately gives (17); the converse is obvious. 

REMARKS. Random functions of the nature of (17) arise in connection with 
certain physical processes. For instance, Loéve [10] has considered the problem 
of a reservoir of water which is losing its contents exponentially and gaining 
from random precipitation. In this case, Z(/) is a Poisson process with jumps 
of variable magnitude. On account of such applications and equation (17), 
one may refer to X(t) as the result of exponential variation of an additive process. 
Now, it may happen that such an X(¢) itself undergoes an exponential variation. 
We shall now deal with the result of such an iterated exponential variation. 

TueoreM 5. Let X,(t) be a random function satisfying the assumptions of Theorem 
1, and consequently such that 


D 


t 
(18) X,(1) e' 0 ALY (t5) + / et dZ(v). 
Det 





552 Ss. G. GHURYE 


Let A, be a p x p matrix such that AyA, = AoA; and | Ay — Ag| # 0. Given any 
positive integer N and any values th < te < tg < +++ < ty of t, witht, = ty, let 
{IIm} = {I(t,t:;1; m),l = 1,2, ---,N;m = 1, 2, ---} be a sequence of sets 
of partitions of the set of intervals { [ty , t;],l = 1,2, --- , N} whose norms 6(l;m) — 0 
forl = 1,2,---,N,asm— o~, For any m, let 


S(t, ti; 1; m) = z. ef tt Ft AD (ty 3) — Xiltimij—v}, 
j=l 


(19) 


Then, asm— ~, 

(a) the set of r.v. {S(t, t:;1;m),l = 1,2, ---,N;m = 1, 2, --+} converges 
in distribution to a set of r.v. {Xo(t,), 1 = 1, 2, ---, N} whose distribution is in- 
dependent of {Mm}; 


(b) in fact, for varying t, 


X,(t) = Ay(Ay “ hey ie’ ** = eS 421 X(t) 
(20) t 
-f- lace’ ** — A,c” **}(A, — a dZ(v); 
DY to 
(c) Also, X2(t) has the property thal 
(21) X(t + 2h) — (e** + o8*)X.(¢ + h) + eh(***4? X,(0) 


is independent of all X,{t'),t SU St. 

Proor. Let e¢ '“!X,(t) = X*(t), so that X*(t) is an additive process. For con- 
venience, we shall write ¢t;,; for t;,,,; and A,,; for {X*(tim,;) — X*(tim.j-1)}. 
Then 


n (lim) 


S(to, tr; l; m) ett Aa @AMAX#(H) + Ara +--+ + Ard 
j=l 
— et *hi-11X* (4) + Ain tee + Aij-a} 


n (lim) n (l,m) 


> A; X*(to) + re B, jAy, 


=1 
where 


’ 
(tit Nos Ants Arts j— 
Aggy = ett bP Aa(e Arts ght tinny 


’ n (lim) 
(t tpg) AatAat; (ty—-t’ pA Artie Artie 
Big = eft hp Aataaty 4 ie ett Naor tie Arti b—1) 


hem +1 


The two terms on the right-hand side in (22) are independent. The first con- 
verges in distribution to 


t 
(23) | et A2 de’ X*(t)]| = alt) X*(t), say. 
t 


0 





RANDOM FUNCTIONS 553 


By a procedure similar to that used in the proof of Theorem 2, we see that 
the second term on the right-hand side of (22) converges in distribution to 


ty t 
(24) of entree + | eft As aes dX *(v) — Y(t), say. 


From (23), (24) and Theorem 3, it follows that for any t;, S(to, t: ; 1; m) 
converges in distribution to X,(t,) as given by (20). To establish the simul- 
taneous convergence in distribution of the set (19) to {X2(t;),l = 1,2, ---,N,} 
we can again use the procedure of the corresponding part of the proof of Theorem 
2. Finally, from (b) we immediately have (c). 

Remarks. The assumption that | A, — A,| # 0 is necessary only for the 
finai reduction of X,(¢) to the form (20). Without this restriction (a) is still 
true, and also (b), except that in (20) the right-hand side has to be replaced by 
{a(t)X*(to) + Y(t)} from (23) and (24); it follows that (c) holds. In fact, we 
have results of this type even if neither of the assumptions AjA; = A2A; and 
| Ai — As| # 0 is satisfied. 

Thus we see that whereas the result of an exponential decay, defined as a 
limit in distribution, of an additive process is a random function satisfying a 
linear relation of the first order, the result of an exponential decay of such a 
random function satisfies a linear relation of the second order. Incidentally, 
we have come across a wide class of random functions with respect to which 
one can define Riemann-Stieltjes ‘“integrals-in-distribution.” This raises the 
question of characterizing the class of all random functions having this property. 
We have seen that this class is wider than that of random functions with inde- 
pendent increments; but its total content is not known. 

It may be noted that equation (17) is formally the same as the solution of 
the set of differential equations 


dX — AX dt = dZ(t), t2 bo. 


Likewise, (20) is formally the same as the solution of the set dX, — A,X.dt = 
dX,(t), that is, 


d’X> — (Ay + Ac) dX2 dt + AyAoX; (dt)? = d’Z(t). 


In fact, when derivatives in mean square exist, the solutions of these equations 
are rigorously given in terms of integrals in the mean. We have now seen that 
the formalism holds in terms of integrals in distribution, when we have on the 
right-hand side of the equations random functions of certain types. 


4. Acknowledgement. The author wishes to express his thanks to Dr. Herbert 
Robbins for advice and help, and to the Institute of Statistics of the University 
of North Carolina, in particular to Dr. Harold Hotelling, for the liberal financial 


assistance and other facilities given to the author during the course of this 
work. 





5. G. GHURYE 


REFERENCES 
[1] J. L. Doon, Stochastic Processes, John Wiley and Sons, 1953. 


W. Fever, ‘Neuer Beweis fiir die Kolmogoroff-P. Lévysche Charakterisierung,’’ 
Bull. Internat. Acad. Yougoslave Cl. Sci. Math. Phys. Tech., Vol. 32 (1939), pp. 
106-113. 

. Guurye, ‘“‘Random functions satisfying certain linear relations,”’ 
Ann. Math. Stat., Vol. 23 (1952), p. 646. 


(2 


\3] § 


(abstract), 


’. GNepeNKo, “Limit theorems for sums of independent random variables,’ 
Uspehi Matem. Nauk, Vol. 10 (1944), pp. 115-165. 
. KALLIANPUR AND H. Rossins, ‘‘On a class of infinitely divisible distributions,”’ 
(abstract), Ann. Math. Stat., Vol. 23 (1952), p. 146. 
HINTCHINE, ‘‘Déduction nouvelle d’une formule de M. Paul Lévy,’’ Bull. Univ. 
d’ état Mosc., serie internationale, Sec. A, Vol. 1 (1937), pp. 1-5. 
Pauu Livy, Theorie de l’Addition des Variables Aléatoires, Gauthier-Villars, Paris 
1937. 
Pau. Livy, Processus Stochastiques et Mouvement Brownien, Gauthier-Villars, Paris, 
1948. 
Pau L&vy, ‘Random functions: General theory with special reference to Laplacian 
random functions,’’ Univ. California Publ. Stat., Vol. 1 (1953), pp. 331-390. 
M. Lobve, “Schema des debits,’’ (unpublished, written in 1944). 


B. von Sz. Naay, ‘Uber messbare Darstellungen Liescher Gruppen,’’ Math. Ann., 
Vol. 112 (1931), pp. 286-296. 


A. K 





TRUNCATED LIFE TESTS IN THE EXPONENTIAL CASE 


By BENJAMIN EPsTEeIN 
Wayne University' 


1. Introduction and Summary. It is frequently desirable on practical grounds 
to terminate a life test by a preassigned time 7, . In this paper we consider life 
tests which are truncated as follows. With n items placed on test, it is decided in 
advance that the experiment will be terminated at min (X,,,,, 7’), where 
X,,.. is a random variable equal to the time at which the roth failure occurs 
and 7’) is a truncation time, beyond which the experiment will not be run. Both 
ro and 7) are assigned before experimentation starts. If the experiment is ter- 
minated at X,,,, (that is, if ro failures occur before time 7), then the action in 
terms of hypothesis testing is the rejection of some specified null-hypothesis. 
If the experiment is terminated at time 7’) (that is, if the roth failure does not 
occur before time 7')), then the action in terms of hypothesis testing is the ac- 
ceptance of some specified null-hypothesis. 

While truncated procedures can be considered for any life distribution, we 
limit ourselves here to the case where the underlying life distribution is specified 
by a p.d.f. of the exponential form, f(z; 0) = 6 ‘e ~ x > 0,@> 0. The practical 
justification for using this kind of distribution as a first approximation to a num- 
ber of test situations is discussed in a recent paper by Davis [1]. It is a common 
assumption for electron tube life. 

Two situations are considered. The first is the nonreplacement case in which 
a failure occurring during the test is not replaced by a new item. The second is 
the replacement case where failed items are replaced at once by new items drawn 
at random from the same p.d.f. as the original n items. Formulae are given for 
E,(r), the expected number of observations to reach a decision; for /,(T), the 
expected waiting time to reach a decision; and for L(@), the probability of ac- 
cepting the hypothesis that @ = 6) , the value associated with the null-hypothe- 
sis, when @ is the true value. Some procedures are worked out for finding truncated 

sts meeting specified conditions, and practical illustrations are given. 

It is an intrinsic feature of all life test decision procedures that they are in 
some sense truncated, although not necessarily by a fixed time 7). In Section 3 
we give exact formulae for 24(r) and E,(7’) for a decision procedure given in 
[2]. There is a close relation between these results and those in Section 2. 


2. Derivation of a truncated test in the nonreplacement and replacement case. 
It will be assumed throughout this section that the underlying p.d.f. of the life 
of items is given by f(x; 0) = 0'e* * «> 0,6> 0. Inthe nonreplacement case, 
n items are drawn at random from the population and placed on life test. Items 


which fail are not replaced and the experiment is truncated at time min (X,,.,. , 


Received 2/6/53, revised 3/1/54. 
! Work done with the support of the Office of Naval Research 


rer 
000 





556 BENJAMIN EPSTEIN 


T>), where X,,,, is the time when the roth failure occurs and ro and 7’, will be 
taken as preassigned. 7’) is a truncation time beyond which the experiment does 
not run. The variate is considered to be time for convenience only. It is perfectly 
clear that it can be other things, depending on the physical applications one is 
concerned with. Generally the variate will be nonnegative. 

Since the probability of an item failing before time T> is given by pp = 1 — 
e~7°* it follows from the binomial law that the probability of reaching a decision 
requiring exactly k failures is 


(1) Pr(r = k| 6) = b(k;n, pe) = (f)pe(1 — pe)”, & =0,1,2,---,m—1 
ro—-1 

(2) Pr(r = m|0) = 1 — >> blk; n, pe). 
k=Q 


The expected number of observations to reach a decision is 


(3) E,(r) = vk Pr(r = k | 6). 


It can be readily shown that (3) simplifies to 


(4) Ex(r) = npe Pp b(k;n — 1, p» | + ro[i _ = b(k; n, p» |. 


This is in a convenient form for calculation. For any preassigned n, 7’), and 
ro , Es(r) can be found easily from the Binomial Tables [8] or the Tables of the 
Incomplete Beta Function [6]. 

We now wish to prove that E,(7'), the expected waiting time to reach a decision 
based on the stopping rule min (X,,,,, 7'o), is 


(5) EAT) = 2d Pr(r = k | 0)Es(Xx.n), 
om) 


where E,(X;,,,) is the unconditional expected waiting time (measured from t = 0) 
to observe the kth failure in the random sample of size n drawn from the under- 
lying exponential p.d.f. 

To prove (5), we note first that E,(7’) is 


ro—1 n 
(6) EAT) = p> b(k; n, p» | + 2 b(k; n, po) Eo Xen |r = k). 
=( —To 


Furthermore E,(X,,,,.), the unconditional expected waiting time to get the roth 
failure, is 


ro—1 


Ee(Xrp.n) = 2, b(k; 1, po) Eo(Xey.n| 7 = k) 
k= 
(7) : 
a 2 b(k; n, pe) Ee(X,,.n | 7 = k). 


To 





TRUNCATED LIFE TESTS 


From (6) and (7) we get 


rg—l 


(8) ET) = Ee(Xro.n) + dX b(k; n, pe)[To — Ee(Xr..n| 7 = k)). 


Since the underlying distribution is exponential, it can be verified in the non- 
replacement case from results in [3] that 


(9) EX... | 7 = k) = To + Eo(Xe.-%,n-4), k=1,2,-:+,m-— 1 


where E,(X,,-x,,-«) is the unconditional expected waiting time to get the 
(ro — k)th failure in a random sample of size (n — k). It has been shown in [2] 
that forl Sk sn, 


- ean 1 1 
(10) E(Xx.n) = 6 (< ee fe ce be 
n n—l 
Therefore 


(11) EX rok ,n-) = Eo(Xro.n) = E(Xx.n), 


Using (9) and (11) in (8) gives the desired formula (5). 

In the replacement case the test is started with n items and any item that 
fails is replaced at once by a new item drawn at random from the underlying 
p.d.f.; thus the number of items under test is always n. As in the nonreplace- 
ment test, case experimentation is truncated at time min (X,,,,, 7), where 
X,»,, is the time (measured from the beginning of the entire experiment) when 
the roth failure occurs, and 7) is a preassigned truncation time. 

Since the underlying distribution is exponential with mean life 6, the replace- 
ment of failed items by new items makes the life test a Poisson process with 
occurrence rate \y = n/6. 

Thus the probability of reaching a decision requiring exactly k failures is 


(12) Pr(r = k| 0) = plk; 07>) = e e"7°'"(nT,/0)*, k =0,1,2,-°-,7— 1 


ro~1 
(13) Pr(r = ro |0) = 1 — D> plk; oT»). 
k=O 


The expected number of observations to reach a decision is 
(14) Ev(r) = > k Pr(r = k | 0). 

k=O 
It can readily be shown that (14) simplifies to 


rg—2 rol 
(15) Es(r) = ro To | > p(k; Xo v9 | + To 1 = ze plk; ro v3 | cs 

k=( kaw 
This is in a convenient form for calculation. For any preassigned n, 7) and ro , 
E,(r) can be found easily from Molina’s tables of the Poisson distribution [5] 
or from the tables on the incomplete I’-distribution [7]. 





558 BENJAMIN EPSTEIN 

The expected waiting time to reach a decision is given by a particularly simple 
formula in the replacement case. It is 
(16) E,(T) = (0/n)E,(r). 


The proof of (16) is analogous to the proof of (5) in the nonreplacement case. 
Thus analogous to (8) we have 


ro—l 


(17) EAT) = EX...) + Dd plk; ke TT. — E(Xen|7 = 4). 
k=O 


Analogous to (9) we have 


(18) Ed Xro.n| 7 = k) = To + Eo(Xrp-t.n); k=1,2 


9 


Furthermore 
(19) Ea Xro—e.n) = Ee(Xeo.n) — Eo(Xin) = (to — k)O/n, 


since the unconditional expected waiting time to get exactly s failures in a re- 
placement situation is E,(X,,,) = 80/n, for any integer s. Substituting (18) 
and (19) in (17) yields (16). 

It is interesting to note that in analogy with (5) in the nonreplacement case 
we can write (16) as 
(20) ET) = >> Pr(r = k | 0) Ee(Xx.n)- 

kewl 
The unconditional waiting times E4(X,,,) are given by (10) in the nonreplace- 
ment case, by ké/n in the replacement case. 

Suppose the truncation rule is such that the hypothesis Hp:@ = 6) is accepted 
if min (X,,.., 70) = 7, that is, if the waiting time required to observe X,,.,, is 
more than 7). Then if L(@) is defined as the probability of accepting 6 = % 
when @ is true, it follows in either the replacement or the nonreplacement case 
that 


(21) L(6) = 1 — Pr(r = ro | 8), 


where Pr(r = ro | 6) is given by (2) in the nonreplacement case and by (13) in 
the replacement case. 


3. Atest based on the first r out of n ordered observations. In [2] it was proved, 
in the nonreplacement case, that when testing the hypothesis Ho:@ = 6 against 
any simple alternative 6 = 6, (0; < 6), the “‘best”’ region of acceptance for Ho 
(in the sense of Neyman and Pearson), based on the first r out of n ordered ob- 
servations from an exponential distribution, is of the form 6,,, > (, where 


(22) bn = |? > 20+ (n— Dea 
i=l 


both r and n being preassigned integers. 
One could interpret the decision rule 6,,, > C to mean that we wait until 
time 2z,,, then compute @,,,, and make the appropriate decision. However, this 





TRUNCATED LIFE TESTS 559 


procedure clearly wastes information since we are able to observe the life test 
continuously. We will now show that, if continuous observation is taken into 
account, often we can shorten the waiting time to reach a decision and reduce 
the number of items failed. More precisely, suppose that at some moment / 
there are exactly k failures,0 S k S r — 1, and that the observed total life V(0), 


5 
(23) VQ) = Darin t n— bt 
i=] 

is greater than rC. The k items which fail by time ¢ contribute Di 1 Zi. to 
V(t). The (n — k) items which have not failed contribute the amount (n — k)t. 
In particular if ¢ = z,,,., then V(z,,,) = > = 1 Fin + (Nn —1)tr.n = 76,,, . Since 
V(t) is monotonically increasing in t, we know that V(z,,,.) 2 V(t) > rC, and 
thus we should stop experimentation at time ¢ and accept Hy. More generally a 
decision rule having precisely the same O.C. curve as 6,,, > C, but requiring on 
the average fewer failures and a shorter decision time, is as follows: 

(a) Continue experimentation so long as V(t) < rCandO Sk sr — 1. 

(b) Stop experimentation with acceptance of Hy) as soon as V(t) > rC and 
Osksr-1l. 

(ec) Stop experimentation at z,,,, with rejection of Hy if V(t) < rC for all 
a 

Note: This means that acceptance of H») takes place between failure times, 
and always before time z,,,. 

We now proceed to find certain useful properties of the truncated rule based 
on V(t). To find these properties, we first remark (defining 2o,, as zero) that 


(24) > Lint (n — ran = ) Be (n — it 1)(tin — Ti-in). 


i=l i=1 
Introducing new random variables defined by 
(25) & = NX,n, & = (n — t+ 1)(Bin — Ti-r,), 


6.,, > C can be written as 


(26) > & > re. 
i=l 

The &; are mutually independent random variables, each distributed with com- 
mon p.d.f. 0 'e*", « > 0, 6 > O. If we interpret £; as the time interval between 
the (¢ — 1)st and ith event in a Poisson process having mean occurrence rate 
\ = 6", it is clear that > int §; > rC, if and only if k, the number of events in 
an interval of length rC, isO < k S r — 1. If the number of events in such an 
interval is = 1, > tat &; S rC. Thus the probability of reaching a decision re- 
quiring exactly p = k failures is 


Pr(p = k )) = plk; pe), hk = . shalt” io 1, 


r—1 
Pr(p =f 6) =j]— >, plk; ys). 





560 BENJAMIN EPSTEIN 


In (27) and (28), wy = rC/0 and p(k; we) = wee “*/k!. The expected number of 
observations to reach a decision is 


(29) Kaly) = Dk Pr(p = k|6) =m [= lk; w) | + ak -= p(k; mo) |. 


It can be verified that E,(7') for the V(t) procedure can be written (as in the 
replacement or nonreplacement case) as 


(30) E(T) = >> Pr(p = k | 0)Ee(Xz.n), 
k=l 


where Pr(p = k| @) is given by (27) and (28) and 2£,(X;,.,) is given by (10). 
Finally L(@), the probability of accepting 6 = 6 when @ is true, is given by 
L(@) = Doin p(k; ws). 

Up to this point in the present section we have been treating the nonreplace- 
ment situation. It is interesting to see what happens if failed items are replaced 
at once by new items drawn from the p.d.f. @e~*””. As in Section 2, let z,, be 
the time when the kth failure occurs (whether it be an original item or replace- 
ment item) measured from the beginning of the experiment. It can be shown, in 
the replacement case, that if one starts with n items, then the “‘best’’ region of 
acceptance, in the Neyman-Pearson sense, for testing a hypothesis H» that 
6 = 6 against alternatives of the form 6 = 6,(6; > 6), based on the first r fail- 
ure times 2,,, Z2.n, °** » 2r.n, iS of the form 6,,, > C, where 6,,,, is now simply 
equal to 


(31) b,.0 = N&r,»/P. 


It follows that the region of acceptance for Hp is of the form z,,, > C* = rC/n. 
Use of z,,,, > C* as a region of acceptance means in words that the test is ter- 
minated at min (X,,,,, C*) with acceptance of H, if truncation occurs at C* and 
rejection of MH if truncation occurs at X,,,,. This is precisely the test treated in 
Section 2 with r = rp and C* = T%). 


4. Some computational remarks. In Section 2 we gave formulae for the O.C. 
curve, the expected waiting time, and expected number of items failed in the 
course of reaching a decision for any preassigned n, 7) , and 75 . We now give a 
procedure for finding the appropriate truncated test (that is, for finding rp and n) 
when the truncation time 7> is preassigned and the O.C. curve is required (for 
preassigned type I error, a, and type II error, 8) to be such that L(@) 2 1 — a 
and L(6@,;) S 8. Here and 6; are preassigned with @) > 6, . 

To find such a test we recall [2] that the best acceptance region of size a for 


the hypothesis @ = 4 (against any alternative 6, < 4), based on the first r out 
of n failures, for preassigned r and n, is 


(32) bn > C = 0 xi-a(2r)/2r. 


(A chi square variable with n degrees of freedom is denoted as x’(n). The con- 
stant x2(n) is defined by the equality Pr(x’(n) > x;(n)) = +.) Inorder that the 





TRUNCATED LIFE TESTS 561 


TABLE 1 
Values of r (upper numbers) and of xi-a(2r)/2. (lower numbers) such that the test based on 


using ..>C= Goxi-a(2r)/2r as acceptance region for @ = 6 will have 
L (6) = 1 — a and L(6;) = B 


a= Ol a= .05 


B= 05 B= 10 B= Ol B= 0S 


101 83 95 67 
79.1 | 63. 79.6 | 54.1 


35 30 33 23 
22.7 ; 24.2 


21 19 
12.4 8.46 





13 | 10 
5.43 


51 


5 4 
2.91 .33 ‘ 97 1.37 


‘ ‘ 3 2 2 
.823 823 ‘ 818 .818 10 . 532 .532 


test have an O.C. curve for which L(@) = 1 — a and L(@;) S 8, we need to 
choose r suitably. The appropriate values of r for certain values of a, 8, and 
6/6, are given in Table 1. For values of a, 8, and 6/6; not given in the table, the 
appropriate r to use is the smallest integer r such that xj~2(2r)/x5(2r) = 61/0. 

It is now an easy matter, in the replacement case, to find a truncated test 
meeting the conditions prescribed in the opening paragraph of this section. In 
view of the last two paragraphs of Section 3, the appropriate ro in the replace- 
ment case is given by the values in Table 1. Furthermore, we want 7) = C* = 
rC/n = Oxi-«(2r)/2n. Since n must be an integer, the equality can be satisfied 
only approximately. For all practical purposes n can be chosen as 


(33) n = [0 xi-a(2ro)/2T 9] 


propriate n, for fixed a and 8, is inversely proportional to the time of truncation 
1’) . Thus, for example, to reduce the truncation time by a factor of two requires 
doubling n. It is clear from (33) that the values of xi-a(2ro)/2 are useful to tabu- 
late. These are given below the associated 7) in Table 1. 

The O.C. curve of the test min [X,,,,., To], where ro is given by Table 1 and 
n by (33), is such that L(@)) 2 1 — a, but in some cases L(6,) may be slightly > 
8. This can be avoided in either of two ways. One way is to give the experimenter 


where [xz] means the greatest integer S x. It is interesting to note that the ap- 





562 BENJAMIN EPSTEIN 


the freedom to use, instead of 7), the slightly larger truncation time 7, 
Ooxi-a(2r)/n; the test based on min [End> To will have L(6) = 1 — @ and 
L(@,) s 8. The other way is to use n + 1 items throughout the test, and to use, 
instead of 7), the slightly smaller truncation time To = Oxi-a(2ro)/(n + 1); 
The test based on min [X,,..4:, To] will have L(@) = 1 — a and L(6,) < 8. In 
most cases it will be a matter of indifference which procedure one adopts. 

The most direct (and also the most lengthy) procedure for finding a truncated 

nonreplacement test meeting the conditions prescribed in the opening paragraph 
of this section is to note that such a test is equivalent to a binomial situation in 
which we test po = 1 — e 7” against p,p = 1 — e “”"', and want the O.C. 
curve to be such that L(po) 2 1 — a and L(p,) S 8. In binomial terms, we are 
seeking a sample size n and a rejection number 7» such that we will accept the 
hypothesis that p = po if the number of defectives (failures) in the sample < 
» — 1. The hypothesis that p = po will be rejected if the number of defectives 
in a sample of size n is 2 ro . The detailed calculations can be carried out in any 
given situation by using the Binomial Tables [8] or Tables of the Incomplete 
Beta Function [6]. 

While the procedure described in the preceding paragraph can always be 
worked out, it is both tedious and time consuming. If the values of a and 6 are 
small and the ratio 0/7) is substantially more than one (say 23), then the re- 
quired n will be fairly large. In such cases a somewhat less exact, but much 
shorter, way of finding the appropriate ro and n can be used. As ro use the same 
value as in the replacement case. Let the sample size n = [r/(1 — &”°“)], 
where C = 6xj~a(2ro)/2ro. The justification for this approximation is briefly 
the following. If n is substantially more than 7 , then the O.C. curve based on 
the rule 8,, n2+).n > C, where B,,., = 1/E(X,,,,) is very close to the O.C. curve 
based on the rule 6,,., > C. To truncate experimentation at time 7’) means find- 
ing an n such that C/6,,,., = 7». When n is large 1/8,,., ~ log [n/(n — r)]. 
After some simple manipulation we arrive at the above formula for n. 

In Table 2, we give some values of n computed by this formula for a = .01, 
05; B = .01, .05; 0/0, = 2, 3, 5; and 6/7) = 3, 5, 10, 20. These values have 
been checked by computing L(@) and L(6,); the O.C. curve does come very 
close to meeting the requirements L(@)) 2 1 — a and L(@,) S 8B. 


TABLE 2 
Values of n to be used in truncated nonreplacement procedures. 


| | 
O01, .Ol O1, .05 .05, .O1 


120 fl 15 87 30 ‘ 90 13 59 
182 61 ; 22 , 132 15 § | «138 2 | 20 90 
340 | 113 | 39 | 245 82 258 36 «169 


657 | 216 | 74 | 472 157 $99 69 325 





TRUNCATED LIFE TESTS 
5. Examples. 


ProsLeM 1. Find a truncated replacement plan for which 75 = 500 hours, 
which will accept a lot with mean life = 10,000 hours at least 95 per cent of the 
time and reject a lot with mean life = 2,000 hours at least 95 per cent of the 
time. Compute L(@), £4(7'), and E4(r) at 6 = 10,000 and 6 = 2,000, respectively. 

So.ution. In this case 0) = 10,000, 6; = 2,000, a = B = .05. Since 6/0; = 5, 
it follows from Table 1 that ro = 5. Since 6/7) = 20,n = [(1.97)(20)] = 39. 
Thus the following truncated replacement plan meets the requirements. Start 
the life test with n = 39 items. As soon as an item fails, replace it by a new item. 
Accept the lot if min [X53 , 500] = 500 and reject the lot if min [X59 , 500] = 
X5,29 . If the lot is rejected, experimentation is stopped at X35 , the time of oc- 
currence of the fifth failure. 

For 6 = 10,000, x75 = 1.95. Using the tables [5] and (21), it is easily verified 
that L(@) = .952. Substituting in (15) and (16) respectively gives E4(r) = 1.93 
and E,(T) = 495. For 6 = 2,000, \x7'» = 9.75. For this value of 6, L(@) = .034, 
E,(r) = 4.95, and E,(7') = 254. 

Prop_LeM 2. Same as 1 except that we want a nonreplacement procedure. 

SOLUTION. ro = 5. According to Table 2, the sample size is n = 42. For @ = 
10,000, T)/8 = .05, and pp = 1 — €& rol — 049. Using the table [8], one finds 
L(@) = .946. Substituting in (4) and (5) respectively gives Ey(r) = 2.02 and 
E,(T) = 494. For 6 = 2,000, T)/@ = .25. For this value of 6, L(@) = .031, Ey(r) = 
1.91, and E,(T) = 248. 

ProBLEM 3. Consider the truncated replacement plan meeting the conditions 
of Problem 1. For what values of @ is L(@) 5? What are Ey(r) and E,(7') for 
this value of 6? 

So.tution. To find the @ such that L(6) 5 means finding A» such that 
7 4 p(k; \eT>) = .5. Using the tables [5] we see that this means that Ag7"p = 
4.67. Therefore 6 = 4,180. From (15) and (16) we find that E,(r) 3.97 and 
E,j(T) = 424. 

Prosiem 4. Consider the truncated nonreplacement plan meeting the condi- 
tions of Problem 2. For what values of @ is L(@) = .5? 

So.uTion. This means finding ps such that pb - 5 b(k; 42, pe) 5. Using the 
tables [8] this means pp = .1104. Since pp = 1 — « re the appropriate 6 


4,274. Here E,(r) and E,(7T) will be approximately the same as in the replace- 
ment case; they have not been computed exactly. 

Prospiem 5. Find a test of the form 6,,. > (, discussed in Section 3, which 
will have an O.C. curve such that L(@)) = .95 when 4 1,500, and L(6@,) Ss .05, 
when @, = 500. 

SoLutrion. From Table 1, it is readily verified that r 10. Therefore the ac- 
ceptance region has according to (32) the form 


bio.n > C = 0) xji~a(20)/20 = 815. 


ProBLEM 6. Set n = 20 in Example 5. Compute F,(r) and E4(7) at 6 = 1,500 
and @ = 500, respectively. 





564 BENJAMIN EPSTEIN 


So.tution. If we interpret the test as in the second paragraph of Section 3, 
it may be possible to stop experimentation with fewer than 10 failures and be- 
fore time 21,2 . Using formulae (29) and (30) for @ = 1500 we get E4(p) = 5.39 
and E,(T') = 475. When @ = 500, Ee(p) = 9.93, and E,(7') = 331. It is interest- 
ing to note that if 610. is computed only after observing 2,0, , the number of 
failures would always be 10. Furthermore, the expected waiting time to reach 
a decision would then be Eo(Xy,0) = @ >i, (21 — ky”. For @ = 1,500, 
Eo(X 1,20) = 1,004 and for 6 = 500, E4(X10..0) = 335. Thus there is considerable 
saving if we take advantage of continuous availability of information. The ulti- 
mate in this direction is a purely sequential procedure which is treated in detail 
in another paper. 


REFERENCES 

{1] D. J. Davis, ‘‘An analysis of some failure data,’’ J. Amer. Stat. Assn., Vol 47 (1952), 
pp. 113-150. 

(2) B. Epstein anv M. Sosegt, ‘‘Life testing I.,’’ J. Amer. Stat. Assn., Vol. 48 (1953), pp. 
486-502. 

(3) B. Epsrern anv M. Soset, ‘“‘Some theorems relevant to life testing from an exponential 
distribution,’’ Ann. Math. Stat., Vol. 25 (1954), pp. 373-381. 

[4] A. Waxp, Sequential Analysis, John Wiley and Sons, 1947. 

[5] E. C. Mouina, Poisson’s Exponential Binomial Limit, D. Van Nostrand, 1949. 

(6) K. Pearson, Tables of the Incomplete Beta Function, Cambridge University Press, 1948. 

[7] K. Pearson, Tables of the Incomplete T-Function, Cambridge University Press, 1951. 

({8] Tables of the Binomial Probability Distribution, National Bureau of Standards, Ap 
plied Mathematics Series 6, 1950. 





ON THE DISTRIBUTION OF THE RATIO OF THE iTH OBSERVATION' 
IN AN ORDERED SAMPLE FROM A NORMAL POPULATION TO AN 
INDEPENDENT ESTIMATE OF THE STANDARD DEVIATION 


By K. C. S. Prat anp K. V. RAMACHANDRAN 
University of North Carolina 


1. Summary. This paper deals with the distribution of any observation, .; , 
in an ordered sample of size n from a normal population with zero mean and 
unit standard deviation. The distribution has been developed as a series of 
Gamma functions, and has been used to obtain the distribution of g; = (2;/s), 
where s is an independent estimate of the standard deviation with v degrees of 
freedom. In a similar manner the distribution of the Studentized maximum 
modulus u, = | z,/s | has been obtained and upper 5 per cent points of g, and 
upper and lower 5 per cent points of u, have been given. The method of obtain- 
ing the different distributions essentially depends on appropriate expansions 


of the normal probability integral and its powers in the intervals — © to x and 
0 to z. 


2. Introduction. The study of ordered samples from a normal population has 
led many authors to the construction of different Studentized tests based on 
outlying observations. One of the important tests in this group is that based on 
the Studentized range, for which tables of significance levels have been given 


by May [4] and Pillai [8]. Nair [5] has considered the distribution of the Stu- 
dentized extreme deviate from the sample mean. 

In the present paper the Studentized extreme deviate from the population 
mean as well as the Studentized maximum modulus are discussed and their 
distributions given for small sample sizes. Roy and Bose [1] and Tukey [9] have 
suggested the use of Studentized maximum modulus for simultaneous confi- 
dence interval statements. These authors have illustrated the use of the upper 
percentage points of the Studentized maximum modulus. 

Box [2], [3] has suggested the criterion u, as a possible test for platykurtosis. 
He points out that if the mean is assumed known, then u, is the likelihood 
criterion for testing the null hypothesis of normality against the alternative 
that the distribution is rectangular. Significance is attained if u, is too small; 
the test uses the lower tail area of the Studentized maximum modulus. The Stu- 
dentized extreme deviate from the population mean can be used in different 


situations, including the problem of simultaneous confidence interval state- 
ments. 


Received 6/29/53, revised 2/8/54. 


1 Presented to the meetings of the Institute of Mathematical Statistics at Chicago, 
Illinois, December 30, 1952. 


565 





K. C. 8. PILLAI AND K. V. RAMACHANDRAN 


TABLE I 


’ . k . 
Coefficients a; for determining coefficients in expansion of normal probability 
integral (cf. Eqns. (3.7) and (3.8)) 


k=l k= 2 3 k=4 k=5 k=6 


. 50000000 . 25000000 . 12500000 . 06250000 -03125000 .01562500 00781250 
. 39894228 . 39894228 . 29920671 . 19947114 - 12466946 -074801677 043634312 
- 08333333 . 24248827 - 30123241 - 28039908 . 22498535 . 16483276 . 11356002 
- 00000000 .06649038 | . 16322920 . 22672284 - 24184706 . 22106882 18252664 
. 0960444444 .013888889 -055413735 . 11879665 - 17364827 . 20227295 20317475 
0744326920 .0°99735572 -019947114 . 0483 14782 092106224 . 13648695 . 16794983 


0°38580247 0950799863 011225059 021654495 -042857313 -074771485 - 11017451 

- 0914072038 - 0°905887 46 . 0249660421 011190082 021145714 038066474 -062529157 
0416075 103 -0416322188 . 0712255041 0947138211 -010536732 019623411 033666900 
0417590048 -0°68527894 , 0924943180 -0714392049 -0°44470780 | .0°96627902 -017740216 
0*53583677 0442256972 0913636594 0944476986 0715758833 0741848164 -0°87577 403 


-0*7 1070901 -0941143388 0460698505 -0°19767291 0758046643 0216453177 - 0739238553 
.0714884355 0°13383047 0410421860 076549643 . 0924399876 0966245098 0716581736 
0740243139 -0°21190527 0*38 170362 -0*18085787 -0°91007838 | .0°27544134 0770311663 
0°35438940 -0°23053505 -0°80135071 0537096286 -0*26274003 | .0°10376136 0929463704 
015186090 0993991620 -0°46425639 -0°17964457 0°76383520 | .0134166748 011424039 


0'173831125 0711517901 0747000596 0*7 4037123 . 0530334492 0411514181 041122684 
09589329046 0936894111 0713802971 0612599920 .0510884773 | .0543525045 0414989845 
.0'213672431 -0°94822617 0927623121 0969957761 . 0625212504 0°15096316 0556433342 
0°119067827 .0'°12939560 . 0°28220259 . 0999240268 . 0752692237 . 0°42 168894 - 0519806847 
022787385 — .0°°50145804 0°13135284 . 0853937376 0°23621441 .0811881132 0662274205 
. 09958406921 .0'241086623 -0°14925379 -0955745257 0992015238 . 0744059326 0°19924791 
01634526341 0120526830 | .0''62487064 .0°14669723 0*15685712 0° 15015557 0770504086 
01416064956 -01911907969 . 02°] 4601423 . 01036846548 -0°11547196 = .0®34471983 .0°23286637 
01847953251 0!214440848 -0°224978183 - 01934521877 -0°12647069 .0°73620775 . 096369897 | 
0641326546 0'531756106 O! 288085611 01117015699 0'°6 4692264 0931172628 0° 17628625 
(0206 1478527 0469636292 044998 13888 0118472470 = .08"67974001 011601965 . 096308 1504 
01497882639 6'778408304 062775844 .0198391492 0'115741833  .0'919918567 0°20448365 
02273188723 014830387828 0!°34400747 .0°220151299 . 01249685732 0!) 18079118 .01947412257 
(921740032 0'*18024897 01436804948 0'437879276 0'242511479 .0°116394551 09910212511 
0248 1320803 0'786164817 0'61 1854144 0915211642 0'921457246 0'279226637 01141797437 


3. Power series expansion for the normal probability integral. In this sec- 


tion we develop a power series expansion for the normal probability integral 
over the range © to x. Let 


ax k 
(3.1) I(k, x) (| ine dt Viz). 


J— 


An appropriate expansion (ef. Section 5) for Z(k, x) is given by 


(3.2) I(k, 2) = e***(as? + ax + asa? + ---), 


where the a’s are given by the recurrence relations 
k) ( 1 "sy 
(27 + Lago jys (k/V/ 2r) las; — ( L3)a2; » + 
(3.3) 
+ {(—1)°/(7!3°) fag 
2) 


(A ‘ k~1 k 
- 2j+42 (h Vv Dr) {agj44 —_ (18)ay; ' 





STUDENTIZED ORDER STATISTIC 


i Kore 
and ay’ = (4)°: Thus 


(3.5) ({ = (/ ce iu/-V x) = I(m, —x) 


and by using (3.2) 

ba nz?/6 ) ( (m) 2 

(3.6) I(m, —z) = € "(ag — ay” + ag” x — > 
Hence 


(3.7) I(k, m, x) = ETM AOE ™ 4 OMe + KEM 2? + --- 


where I(k, m, x) = I(k, x)I(m, —2x) and 


(3.8) bb; = , (—1)? “aj” aj). 


i=() 


Pillai [7] has given a similar expansion for the powers of the normal probability 
integral in the interval 0 to x. The a‘ coefficients for i ranging from 0 to 30 
and k from 1 to 7 are given in Table [. 


4. Distributions of the ith ranked observation, Studentized extreme deviate 
and Studentized maximum modulus. Let 1, S m S 2%, S ...S 2%, be an or- 


dered sample from a normal population with zero mean and unit standard 
deviation. The distribution of any ranked observation, x; , is given by 


(4.1) p(x) = [n!/(@ — 1)! (n — 0)! V2x\l(i — 1, x) I(n — i, —z)e 7", 
Using (3.7), p(a,) takes the form 
(4.2) p(xi) = [n!/(i —1)! (n — i)! SOR OH? 

+ bi Pn, + +++), 


The distribution of an independent estimate of the standard deviation s is given 
by 


‘ ‘ ‘ 27 ¢ —1 —ve2/2 
(4.3) p(s) = (2(v/2)""/T(v/2)|s” e" ™. 


Multiplying (4.2) by (4.3), using the transformation q; = 2;/s, and integrating 
with respect to s in the interval 0 to ~, we get 

n!(v/2)*!? 
(6 — 1)!(n — at)! 22 T(/2) 


= in cil 6 Pere 
2b} q 7 
j= _(n + 2)qi + 3» 


p\aq ) 


Using (4.4), the probability integral of g; can be evaluated with the help of 
Tables of the Incomplete Beta Function [6]. Putting 7 = n in (4.4) gives 


n(v/2)""? = 6 _—" (? ty+1 
(4.5 _) 2 Se bs 2 ry : " 
5) P(qn) I'(v/2)V 2x 2X . i + 2)q, + 3v - Je 





K. C. 8 PILLAI AND K. V. RAMACHANDRAN 


TABLE II 


Upper 5% points of qn = (x,/8), for ordered samples of sizes n 
with v degrees of freedom 


dN ht bw & 


bd bo 


~ 


t 


bo bo th bt 


bo 


bo 
bo 


bo 
bo 
bo 
bo 


i 
~~) 


t 


bo bt to 
bo bt 

bo bh} bo 
bo bt bo 
to bd bo 


t 


to 
bo bo 
bo 


~ 
o bo 


nN bt 
bo bo & bo 
bh bw bo 


2. 
2. 
2. 
2 


bo 
bo t 
~ 


bo bo 
bd bt bt 


.04 


b 


~ 


.68 
. 67 
. 66 
64 


bo 


.02 ‘ 

.00 17 
.98 14 
.96 | 2.12 


32 
.29 
26 
.23 


bo 
bo 


— & DO 

2 bt be 
bo bo bo 

bd bo th 


bo 
bo 


19 


; NwwhWet 


Upper 5 per cent points of g, , computed using (4.5), are given in Table II, for 
n from 1 to 8. 


For obtaining the distribution of the maximum | z | , we may start with the 
probability law 


(4.6) V2]xe?" 0<t<« 


and noting [7] that 


z ke 
(4.7) | "a ar| = fey + COP ct + CP 8 + ---] 
/0 





STUDENTIZED ORDER STATISTIC 


TABLE II 
Upper 5% points of u, = | x,/s8 | fer ordered samples of sizes n 
uith v degrees of freedom 


5 6 
.78 .04 
3.28 
3.06 
2.97 
2.91 


a) 


2.47 
2.41 
2.38 


Oe 
wow 


82 
78 


NNW Nw WwW Ww 


bo 
bo Ww bo 


30 

40 

60 
120 | 


x 


~ 
bt 


.35 
32 | 
.29 
.26 
.23 


.86 
.82 
77 
.73 
.69 


73 | 
69 
65 


61 


bo 
bo bo th bl 
nN bt be 


bo bt b& be 


.53 
49 


— me DO bo t 


bo bo 


-- 
oF 


.96 


bo 
bo te Ww WW bo 


dN bo bh b&b 


bo bt bt bb 


bw bo 


64 | 


bo 


(J. W. Tukey [9] states that some upper percentage points of the Student- 
ized maximum modulus were computed by P. Nemenyi.) 


we get 


(4.8) (| 2m |) = nC 27a) et PeB!* > CHD gp 2He-t 
j=l 


Since u, = | z,/s |, the distribution of u, is given by 


n/2 /9\"'? x (n+2 j+v)/2 
p(u,) =n (2) (», 2) > or | aor — | 
1 I'(v/2) 4=6 (n + 2)u2, + 3v 


Ly tttel ps (* + 2j +2) 
n 2 . 


(4.9) 


It may be noted that in (4.8) and (4.9) Cj""” = 1 andC{"”” = 0. The C coeffi- 
cients are given by Pillai [7] (in his notation they are K coefficients). Using 
(4.9), the upper and lower 5 per cent points of u, have been computed with 
the help of Tables of the Incomplete Beta Function, and are given in Tables 
III and IV for small values of n. 


5. Convergence of the series. For examining the convergence of the different 
series developed in sections 3 and 4, let us start with series (4.7) for the case 
k = 1, given by 


(5.1) | ee dt = re 1 + CL ct +--+]. 


~0 





K. C. 8. PILLAI AND K. V. RAMACHANDRAN 


TABLE IV 
© points of U, = \x,/8\| for ordered samples of sizes n 
with v degrees of freedom 


Lower 6“ 


.O8 ‘ j ‘ ide 48 .82 86 


5 
.07 + 46 oO ‘ te Ak ° 33 .92 


.07 ‘i .59 ; ; 83 8 .92 96 
.07 . 26 A .60 ‘ : ‘ j 95 99 
.07 AT .60 ; ; 8! i 96 Ol 


.06 ‘ AT 61 ; 8 8S 95 O01 .06 
.06 A AT .62 ‘ 8: 91 97 .03 .08 
.06 28 AT .62 91 .98 .04 .09 
.06 , 47 .62 .92 98 .04 .09 
.06 Z 47 .62 .92 .99 .05 10 


sa 
> im OO 


.06 : AT . 63 
06 .28 47 .63 
.06 .28 47 .63 
06 . 28 47 63 


.92 99 05 aa 
.93 | 1.00 .06 ia 
.93 | 1.00 .06 ld 
93 1.00 O06 ‘fH 


“JI -J -J sJ 
ow 


on 


, 2/2 . ‘a ' ‘ 
If we expand e as a power series and integrate term by term (assuming its 


validity, which is easily shown in this case), we get an expansion of the integral 
in the form 


“2 “z 4 4 2 »* 
(5.2) | et dt = ( md +o - vs) a = | oS + = af, 


“0 “( 6 10 


. ° . ios ‘ ‘ 2/6 
As the first two terms in square brackets in (5.2) are contained in ¢* '", the 
appropriateness of the series expansion (5.1) is immediately obvious. Since the 
integral 


Se ee tltdy | 1 f° or 
(5.3) rm wie, ht attend Sone 


the expansion (3.2) follows from (5.1). An examination of the convergence of 
the series in (5.1) will thus be enough to show the convergence of the series 
(3.2), It can easily be shown [7] that the C’s follow the recurrence relation 


(5.4) 3(2i + 1)c{” 


Hence 


(5.5) CC 





STUDENTIZED ORDER STATISTIC 
and 
(5.6) 


where 


3-5 -++-(2i—3)  3-5--+ (i — 5) 


(5.7 A= am < obits 
we G— 1)! G— 2)! 


dees. 

It may be noticed that since the right hand side of (5.1) is an alternating series, 
—C}"/C}2, is always positive. Moreover, A is also positive (except when A is 
the sum of the first two terms on the right side of (5.5) which is equal to zero). 
Since the second term in the square bracket of (5.6) is positive, the right side 
of (5.6) will be increased if we decrease A. Now if we neglect all the terms of 
A except the first two (where the sum of the neglected terms is positive), we 
decrease A and hence increase the right side of (5.6). In other words 


— my Ds me BBE oe Pte 2 
(5.8) ay < ave : ~ | 1 + (21 : 3) (21 »] = -— (@ DD 4 @ 
ce, 3(27 + 1) i(i — 2) (1 — 2)(2t + 1) 


Hence when 7 is large 

~ y ( ; 1 io” 

(5.9) —CP/CY, < 1/24. 

Again, if we retain the first four terms in the expression for A in (5.7), we get 


—C§? — @ — 1)(110* — 781” + 1691 — 105) 


5.1 : 
(5.10) cy”. 31(i — 2)(2t + 1)(5c? — 297% + 39) 


If 7 is large 
(5.11) —C$?/CP, < 11/301. 


The right side of (5.10) can be made smaller if we consider more terms in the 
approximation to A. Hence the series }-¢ | C{” | is absolutely convergent, and 
hence >>* C{” is convergent and the absolute value of the ratio of the ith to 
the (i — 1)th term of the power series in (5.1) (with which we are really con- 
cerned) is less than 112°/30i for large values of i (considering only the first four 
terms in (5.7) to approximate A). Hence the series (5.1) is absolutely convergent 
and therefore the powers of the series are also convergent. 

Now consider the series expansion in (3.2). For k = 1, it can be shown that 
the sum of the terms involving even powers of x (which are all positive) is 4. 
Hence from the absolute convergence of the series (5.1), the absolute con- 
vergence of (3.2) is immediate. It may be noticed that the series (5.1) is rather 
rapidly convergent, so that, for a relatively small z, only a few terms of the 
series will suffice for any degree of accuracy desired in practice. 


6. Acknowledgment. The authors wish to express their indebtedness to Pro- 
fessors R. C. Bose and 8. N. Roy for suggesting this problem and for their 
guidance in the preparation of this paper. 





K. C. 8. PILLAI AND K. V. RAMACHANDRAN 


REFERENCES 

| 8. N. Roy anp R. C. Bosez, ‘Simultaneous confidence interval estimation,’’ Ann. 
Math. Stat., Vol. 24 (1953), pp. 513-536. 

G. E. P. Box, ‘“‘Nonnormality and tests on variances,’’ Biometrika, Vol. 40 (1953), 
pp. 318-335. 


3] G. FE. P. Box, ‘‘A note on regions for tests of kurtosis,’’ Biometrika, Vol. 40 (1953), 


pp. 465-468. 

Joyce M. May, “Extended and corrected tables of the upper percentage points of the 
‘Studentized’ range,’’ Biometrika, Vol. 39 (1952), pp. 192-193. 

K. R. Narr, “The distribution of the extreme deviate from the sample mean and its 
Studentized form,” Biometrika, Vol. 35 (1948), pp. 118-144. 

K. Pearson, Tables of the Incomplete Beta Function, Cambridge University Press, 
(1934). 

K, C. 8. Piuuai, ‘On the distributions of midrange and semirange in samples from a 
normal population,’’ Ann. Math. Stat., Vol. 21 (1950), pp. 100-105. 

K. C. 8. Privat, “On the distribution of ‘Studentized’ range,’’ Biometrika, Vol. 39 
(1952), pp. 194-195. 

J. W. Tuxey, “The problem of multiple comparisons,”’ Preliminary report (unpub 
lished), Princeton University, p. 169. 





ON THE DISTRIBUTION OF THE LIKELIHOOD RATIO’ 


By HrerMaAN CHERNOFF 


Stanford University 


1. Summary and Introduction. A classical result due to Wilks [1] on the 
distribution of the likelihood ratio \ is the following. Under suitable regularity 
conditions, if the hypothesis that a parameter @ lies on an r-dimensional hyper- 
plane of k-dimensional space is true, the distribution of —2 log \ is asymptot- 
ically that of x’ with k — r degrees of freedom. 

In many important problems it is desired to test hypotheses which are not 
quite of the above type. For example, one may wish to test whether @ is on 
one side of a hyperplane, or to test whether @ is in the positive quadrant of a 
two-dimensional space. The asymptotic distribution of —2 log is examined 
when the value of the parameter is a boundary point of both the set of @ corre- 
sponding to the hypothesis and the set of @ corresponding to the alternative. 

First the case of a single observation from a multivariate normal distribu- 
tion, with mean @ and known covariance matrix, is treated. The general case 
is then shown to reduce to this special case where the covariance matrix is re- 
placed by the inverse of the information matrix. In particular, if one tests 
whether @ is on one side or the other of a smooth (k — 1)-dimensional surface 
in k-dimensional space and @ lies on the surface, the asymptotic distribution of 
d is that of a chance variable which is zero half the time and which behaves 
like x’ with one degree of freedom the other half of the time. 


2. Notation and background. We shall use some of the notation and results 
of Mann and Wald [2]. In particular, if {z,} is a sequence of k-dimensional 
chance variables and {f,} a sequence of positive numbers, we write 


(1) tn = On(fn) 


if for each ¢ > 0, there is an M, such that P{|2z,| > M.f,} <1 — «. Similarly 
we write 


(2) In = 09(fn) 


if x,/f, — 0 in probability, or equivalently, if, for each « > 0, there is a sequence 
M,. — 0 such that P{|2.| > Mufa} <1 —e. 

Mann and Wald have shown that the calculus used with the usual O and o 
notation applies to O, and o,. For example, if z, = O,(+/n) and y, = 0,(1), 
then tnYn = 0p(</n). We shall frequently drop the subscript n where there is 
no ambiguity. 

We write d(x) for the distribution of x and d«(z,) = d(x) if the limiting 


Received 7/24/53. 
1 This work was prepared with the partial support of the Office of Naval Research. 
573 





574 HERMAN CHERNOFF 


distribution of x, exists and is that of x. The following are included in results 
of Mann and Wald [2}: 

If d~(z,) = d(x), then d~[z, + 0,(1)] = d(z). 

If dw(z,) = d(x) and g is continuous, then d[g(z,)| = d[g(z)]. 


If x, — ¢ in probability and z, — c = O,(f,) and g has continuous rth order 
derivatives at c, then 


(3) g(an) = T(an,¢, 7) + 0,(f;) 


where 7'(z, c, r) is the rth order Taylor expansion of g(x) about c. 

Furthermore, we shall use some properties of likelihood functions and maxi- 
mum likelihood estimates which are implied by the following regularity condi- 
tions ® {3}. 

Conpitions ®. The data X = (x, 2%,°-+*, 2) consist of n independent 
observations with common density f(z, @) satisfying 

(a) For almost all z, the derivatives 

d log f d” log f d° log f 


a0,” 00,00; ” 00; 80; 96 
exist for every @ in the closure of a neighborhood N of 6 = 0. 
(b) 7f Oe N, 
of | a’f . d° log f 
—-| < F(z), —.| < F(z), ms < H(z), 
| 0; | x) a0, a0; *) 30; 30; 90m *) 
where F is finitely integrable and E{H(x)} < M, with M independent of 0. 
(a log f a log f) 
(c) If6eN, BA! log f 9 log f\ 
\ 06; 06; 


Let the likelihood function be given by 


is finite and positive definite. 


L(X, 0) = [I fra, 0). 


awl 


From the above conditions, it follows that for @ ¢ N 


(4) , log L(X, 0) = ; log L(X,0) + A’0 + 40’BO + |6 *.0,(1), 
nm n 


where A is the vector whose ith component is 


n 


1 9 log f(x. , 0) 
ee & f( 
NT awl 06, 
and B is the matrix whose (7, 7) term is 
1 > a” log f(xq , 0) 
N a= 00; 06; 


If the “true” value of the parameter is given by @ = 0, that is, @ is at the origin, 
then the asymptotic distribution of /n A is normal with mean 0 and covari- 





LIKELIHOOD RATIO 


ance matrix ./J, where 


E 8 log f(x, 0) a log f(a, 0)\ 


J; 
00, . 


is the positive definite information matrix and, furthermore, B —- —J in prob- 
ability. Also, the maximum likelihood estimate 6 computed under the assump- 
tion 6 ¢ N satisfies 


(5) 6= JA + o,(1/Vn). 


Let us consider the likelihood ratio for the test of a hypothesis H:0 ew C N 
against the alternative K:6¢7 C N. Fora set ¢ C N in k-dimensional space 
we define 


(6) P(X) = sup L(X, @), 


6e¢ 


(7) MX) = P.(X)/P.y(X), 
(8) A*(X) = P.(X)/P,(X). 


Since \* is more expressive than A (that is, \ = A* if \* S 1 andA = 1 if A* > 
1) it suffices to study the distribution of \*. We also define 6, as that value of 
# in the closure of g which maximizes L(X, 0). Then L(X, 6,) = P,(X). 


3. Examples involving the normal distribution. We shall present a few ex- 
amples where the observations z have a multivariate normal distribution with 
mean 6 and known covariance matrix 2. Since the sample mean is a sufficient 
statistic for the mean of a normal distribution, it suffices to treat the case where 
the sample size n = 1. For this case 
(9) P(x) = (29)? | & | He 3 
where Qu(x) = infgeu (a — 6)’ 2 "(2 — 6). Then 
(10) —2 log A*(x) = Q.(xz) — Q(x). 


In the special case where > = J, Q.(x) represents the squared distance from x 
to w. In the more general case also, Q(z) may be regarded as a squared distance, 
but the distance is measured with respect to a non-Euclidean metric. We treat 
the following examples of varying degrees of specialty. 

EXAMPLE 1. Let w = {(6;, 02):a;0; + ao. < 0.< b,6, + beO.}, and let the 
complement of w be r, and = = J. Because of the symmetric nature of \*, we 
may assume that the angle ¢g from the vector (a; , az) to the vector (b, , b.) is 
no more than 180°. Then w represents a cone with vertex at the origin and with 
angle g between the boundary lines. For x ¢ w, —2 log A*(x) is the negative of 
the squared distance from x to the boundary of w; for xe r, —2 log \*(z) is 
the squared distance from x to the boundary of w (Fig. 1). If @ = (0, 0), the dis- 
tribution of —2 log \*(x) depends only on gy. Let us call this distribution G, . 

We note that if ¢ 180°, we may rotate the axes so that w is the right half 





HERMAN CHERNOFF 


-2106 x)e-c* 


‘ 
\~ (by) Dp) 
\ 


\ 


Fig, 1 Fie. 2 


plane. Then —2 log A*(x) = —zi for 2; > 0, and = zi for x, S 0. Then Gio 
is characterized by the density 


gly) = (2m) te! | y | 4, 


which is related to that of x’ in an obvious way. 

EXAMPLE 2. We alter Example 1 to let = be a known positive definite sym- 
metric matrix, not necessarily 7. Then there is a nonsingular matrix 7 such 
that 2 = 7’T. Let y = Tx,d = Ta, e = Tb. Our problem then is reduced to 
that of the previous example where ¢ is now the angle from d to e. Since straight 
lines go into straight lines under a linear transformation, it follows that ¢ 2 
180° if and only if the angle from a to b is = 180°. Also 


_de a’T’Tb a’=-% 
|\d||e| W(a'T’Ta)(b'T’Tb) ~V(a’'=—a)(b’>-d) * 


EXAMPLE 3. Let w be the first quadrant except for the origin and + be the set 
consisting in the origin alone. Let £ = 7. Then —2 log \*(z) is 


cos g = 


1 = 0 0 s Ty 
ay | —(xi + 23) 
as 0 0 —zx} 


It is easily seen (Fig. 2) that if @ = (0, 0), 


0, 
P(2 log A*(x) S c) = | 
(4 + 4Pi(c) + 3P2(0), 


where P, and P, are the c.d.f.’s for the x’ distributions with one and two degrees 
of freedom, respectively. 





LIKELIHOOD RATIO 577 


4. Main result. We shall now treat the case where @ is a k-dimensional pa- 
rameter, and the density of f(x, @) satisfies the regularity conditions ®. We first 
prove 

Lemma 1. If the origin is a limit point of ¢ and 6, — 0 when @ = 0, then 
6, = O,(1/Vn) when 0 = 0. 

Proor. Refer to equation 4 with @ replaced by 6,. For each « > 0 there is a 
sequence c,. — 0 and a K, such that with probability greater than 1 — « 

. » 


K, ; 
eS Gia, |\A} < vn > ie (Bis + Siz)” < One 


t,j=1 


and the term represented by | 6, |° 0,(1) is less than K, | 6, |°. When these in- 
equalities are satisfied there is a K? such that 
‘ s/ 4 a 4 af 4 = 6 | il ‘ 
O < A’6, + 46, Bb, + |6,|°0,(1) < —46, 76, + Kr e' + cre | 6, ’). 
Vn 

But then there is a K7* such that 6, < KI*/+~/n. The lemma follows. 

Derinition 1. A set C is positively homogeneous if 6 « C implies a@ ¢ C for 
a> 0. 

DeFinition 2. The set y is approximated by the positively homogeneous set C, if 


* 


inf |x —y| = o(|y|) foryee and inf |x—y| =o(|x]|) forzeC,. 
zeCy vee 
We may remark that a set bounded by smooth surface through the origin is 
approximated by the union of an open half-space with an optional positively 
homogeneous subset of the tangent hyperplane. It is also easy to see that if » 
is approximated by a nonnull positively homogeneous set other than the whole 
space, then the origin is a boundary point of ¢. 
THeoreM 1. Jf 
(1) the regularity conditions & are satisfied, 
(2) the origin is a boundary point of ¢ implies that 6, — 0 in probability when 
6 = 0, and 
(3) the sets w and r are approximated by nonnull and disjoint positively homo- 
geneous sels C. and C, , 
then, when 6 = O, the asymptotic distribution of \* is the same as it would be 
for the test of 0 ¢ C. against 6 ¢ C, based on one observation from a population with 
distribution N(6, J~'). 
Proor. Throughout this proof we assume that the “true” value of the pa- 
rameter is 0 and we use @ to represent the argument of the likelihood function 
Since 6 = J ‘A + Op( /V/n), 


L tog L(X, 6) = | log L(X,0) + $A4°JA + 0,(1/n). 
n nm 


Let 0 = JA + n, with » = O0,(1/ V/n). Then 


. log L(X, 0) = . log L(X,0) + 4A’/ “A= in’ Jn + 0,(1/n). 
n n 





578 HERMAN CHERNOFF 


Applying Lemma | to » and r, 


—2 log A*(X) =n inf ’Jn — inf ’Jn) + 0,(1). 


Gew Ger 


inf (y — 0)’J(y — 6) = inf (y — @)’J(y — @) + oj y}’) 


Ge¢ eC, 
and therefore 
—2 log \*(X) 


n{ inf (J~'A — 6)'J(J"A — 0) — inf (J~'A — 0)’J (JA — 8)] + 0, (1). 


660, 6eC, 
Since C., and C, are positively homogeneous, 


—2 log A*(X) = inf (z — 6)’J(z — 6) — inf (2 — 6)'J(z — 0) + 0, (1), 
OeC, OeC, 

where z = WnJ "A and dw(z) = N(O, J’). The function infeec (2 — 6)’ 
J(z — @) is certainly a continuous function of z. From the results of Mann and 
Wald [2] it follows that the asymptotic distribution of —2 log A*(X) is pre- 
cisely the distribution of 

inf (z — @)’J(z — 6) — inf (z — @)’J(z — @) 

OeC, OeC, 
under the assumption that d(z) = N(0, J~'). Referring to Section 3, we see that 
this is precisely the result we seek. 


Remark. The final sentence of the introduction is a simple consequence of 
this theorem, together with the obvious extension of Example | in Section 3, 
the fact that nonsingular linear transformations transform hyperplanes into 
hyperplanes, the nature of the positively homogeneous approximation of a set 
bounded by a smooth surface, and the relation of \ to X*. 


REFERENCES 
{1] 8. 8. Wixxs, ‘‘The large sample distribution of the likelihood ratio for testing com- 
posite hypotheses,’’ Ann. Math. Stat., Vol. 9 (1938), p. 60. 
[2] H. B. Mann, anv A. Waxp, ‘On stochastic limit and order relationships,’’ Ann. Math 
Stat., Vol. 14 (1943), p. 217. 
(3] H. Cramtr, Mathematical Methods of Statistics, Princeton University Press, 1946 





THE USE OF MAXIMUM LIKELIHOOD ESTIMATES IN x’? 
TESTS FOR GOODNESS OF FIT' 


By HerMaAn CHERNOFF AND E. L. LEHMANN 
Stanford University and University of California 


Summary. The usual test that a sample comes from a distribution of given 
form is performed by counting the number of observations falling into specified 
cells and applying the x’ test to these frequencies. In estimating the parameters 
for this test, one may use the maximum likelihood (or equivalent) estimate 
based (1) on the cell frequencies, or (2) on the original observations. This paper 
shows that in (2), unlike the well known result for (1), the test statistic does 
not have a limiting x*-distribution, but that it is stochastically larger than 
would be expected under the x’ theory. The limiting distribution is obtained 
and some examples are computed. These indicate that the error is not serious 
in the case of fitting a Poisson distribution, but may be so for the fitting of a 
normal. 


1. Introduction. When using x? for testing that a sample comes from a dis- 
tribution of specified functional form such as a Poisson or normal distribution, 
the problem arises as to what estimates of the population parameters to use. 
If only the numbers m, of observations falling into the ith of the k cells are 
available, there is no difficulty. Let p; (¢ = 1, --- , k) denote the probability of 
an observation falling into the ith cell, and let j; be any best asymptotically 


normal (b.a.n.) estimate of p; such as the minimum x or maximum likelihood 
estimate. Then it is known [1], [2] that under suitable regularity conditions the 
asymptotic distribution of 


(1) R = > (m; — mp.)*/np; 


is that of x° with k — s — 1 degrees of freedom, where s is the number of (in- 
dependent) population parameters being estimated. 

If, however, the original observations 2 , --- , z, are available, one is tempted 
to use more efficient estimates, such as the maximum likelihood estimates p,; 
based on all the data. One may reasonably expect this procedure to provide 
more powerful tests than those based only on the m; ; at the same time the 
estimates usually are simpler and easier to obtain. This is in fact the procedure 
recommended in many textbooks, particularly for the fitting of Poisson dis- 
tributions, either as an approximation to the one with known theory described 
above, or more often without comment. 

It is the purpose of the present paper to obtain the distribution of 


(2) R = - (m; — np.) /np ’ 


Received 7/24/53. 
' This work was prepared with the partial support of the Office of Naval Research 


579 





580 HERMAN CHERNOFF AND E. L. LEHMANN 


which differs from that of R. If we let 
(3) R => (m — npi)*/np;, 


which has a limiting x’-distribution with k — 1 degrees of freedom, we shall 
show that the limiting distribution of R lies between those of R and of R. More 
specifically, we shall show in Section 3 that under suitable regularity conditions 
we have 

Turoreo 1. The asymptotic distribution of R is that of 


k—e k—1 


1 
(4) dy vit d ivi 

tl t=—A—8 
where the y; are independently normally distributed with mean zero and unit vari- 
ance, and the d; are between 0 and 1 and may depend on the s parameters 6, , --- 
6, . 

This result indicates that the recommended procedure of rejecting the hy- 
pothesis of goodness of fit when R > C, where C is obtained from the x*-dis- 
tribution with k — s — 1 degrees of freedom, will lead to a probability of re- 
jection which, when the hypotheses is true, is greater than the desired level of 
significance. However, a numerical investigation of a few special cases indicates 
that, at least in the Poisson problem, this excess of probability of type I error 
will be so small as not to be serious. The situation appears to be not quite so 
favorable in the normal case. 

Throughout this paper, the notation and background material given in Sec- 
tion 2 of the preceding paper [3] will be used. 


’ 


2. Example. Before proceeding to the main result, let us treat the special 
example where the observations are independently and normally distributed 
with unknown mean and variance 1, and where the cells are (— ~, 0) and (0, 
o). In this case it is obvious that R = 0. However, 


2 2\2 -\2 
R-> (m; — np) (m,; — npr) 
R = — ee = Oe mm a 

= mp, nip ba 


where 


ee ~~ 4 . 


a7 «oo 
plu) = 1—pilu), pi = pi(Z). 
We have 
- 1 m—np np—pn)\_ (e—»)’ 
k= —1 (m=. + Se) 2 tt + oft 
7, Pi P2 + 0,(1) V/n Vn Pi Pe on(1)] 


where « = (m, — np:)/~/n and v = Vn (fi — p:). Using the first order Tay- 
lor expansion of p;(%) about p:(u), we have 


y= —Vn (4% — pw) 8 /v/2e + 0,(1) = v’ + 0,(1) 





MAXIMUM LIKELIHOOD ESTIMATES 581 


where » = — W/n(@ — yp) &”'"/+/2x. Let g(x) = 1 for x < 0, and = 0 
otherwise. The central limit theorem tells us that, since 


(m,, nz) = - (g(x), Lal, da (¢, »’) = N(O, 2) 


where N(O, 2) denotes the normal distribution with mean 0 and covariance 
matrix = given by 
~Pi Po eo /2n 
oe /Qe oe /Qe 
Hence 
dole, v) = N(O, 5), dale — v) = N(O, py — € "* /2m) 


and in particular e — v = O,(1). It follows that d« (R) = d(dy’), where d(y) = 
N(0, 1) andA = 1 — eo" /Qapip» < 1. The fact that \ = 0 follows from the fact 
that = is nonnegative definite. 

Nore. A general proof of Theorem 1 cannot be based solely on the fact that 
p; is a better estimate of p,; than j, is. Suppose, in fact, that we use pi = p,(2 — 
£) as our estimate of p, . In the event that up = 1, pr has the same distribution 
as p, . The above argument repeated for R* shows that v would be replaced by 
—vand \ by A* = 1 + 3c” /2xpip : 

3. The general case. We shal! now prove Theorem 1 under the following 
regularity conditions: 

(i) The p,(@) satisfy the condition on pages 426-427 of Cramér’s Mathe- 
matical Methods of Statistics. 

(ii) Let z = (2, +--+ , 2) where z; = 1 if the observation falls in the ith cell 
and 0 otherwise. Let f(z, 6) = [pi and let us assume that the value w of our 
chance variable x determines z, and that the density of x is given by 


f*(w, 6) = TT pit g(wle, 8) 


where g is the conditional density of z given z. Then we assume that {* satisfies 
the condition ® of the preceding paper [3]. 
Let 


(5) mi — npi = Vnpie , 

(6) n(pi — pi) = Vapi, 
(7) n(pi — pi) = Vnpii. 
Then 

(8) R=DVd= de 

(9) R => (e — %)* (1 + 0, (1), 
(10) R => (e: — #)* [1 + 0,(1)). 





582 HERMAN CHERNOFF AND E. L. LEHMANN 


We shall first compute 7; to show that R is asymptotically a sum of squares 
of the components of a normally distributed chance variable, and then do the 
same for Rk. We have 


d log f(z, 0) _ ~. 2; Op; 
00; im Pi 00; © 
The information matrix referred to in Section 2 of [3] is given by 
“. 1 dp, op 
(11) J= — Pe <Pr || = DD 
r=l Dr 06; 36; 
where 
1 Op, 
V p; 00; 


The corresponding A vector A has elements 


(12) D= 


k a 
¢ | mM, OP, _ 


/ My — py OPry 
N rmi Pr OO; r= pr 06; 


Therefore 
(13) A = (1/V/n)D"e 
and 


/n(pi — : : 9 
— Vv n Pi ; Pi) an Zz Vn); a= 6;) : 7 Pi +. 0,(1), 
V pi i=l V pi 9%; 


: DV nb — 0) + 0,(1) = DJ~'D'e — 0, (1). 


Finally 

(14) R = (Fe)'(Pe) + 0,(1) 
where 

(15) F=I1-DJ'D’. 
ie 


(16 d log f*(w, @) —~ 2; Op 8 log g(w | z, 8) 
») - = > eugene af , 
00; i=l Dp; 00; 06; 


Since the conditional expectation, given z, of 


> 2; Op, 8 log g(w | z, @) 
iml Pi 06; 00, 
is zero, we have 


(17) ‘ J+ J* 
(18) A=A+A* 





MAXIMUM LIKELIHOOD ESTIMATES 


where 
(19) = |p| 2 toe glx] 2, 6) | a log o(x|z, 6) 
, . 30; 00; ' 


1 S A log g(x 2’, 6) 
(20 A? =— ammonia adele 
n a 00; 


and 2” is the ath observation on z. Now 

i= Vn “Ie = os V nl}; — 6;) Tr - + o,(1), 
(21) > = DvV/n(6 — 6) + 0, (1). 

DJ + J*)"'(D'e + VnA*) + 0,(1). 

Hence 
(22) R = (Fe + Gn)'(Fe + Gn) + 0,(1) 
where 7 = VnA*, while F = I — D(J + J*)* D’ and G = DJ + J*)". 
The asymptotic distributions of R, R, and FR are those of ee, (F'e)’(Fe), and 


(Pe + Gn)'(Pe + Gn), respectively. To find these distributions we must know 
the asymptotic distribution of (€, 7). Applying the central limit theorem to 


’ ’ 


| d log g(w | z, @) 8 log g(w | z, | 
Mi site. *° 2. . eo . 


006, 


we see that 


‘ , I-q7 9 
(23) dw(e,n) = N 0, ( 0 a4 p) | 


where q is the vector whose ith component is ~/p;. (Note that D’g = 0.) 

From one of the Mann-Wald results it follows that the asymptotic distribu- 
tions we desire are those obtained by assuming that (e, ») actually have the 
above joint normal distribution. That is we assume that 


d(e) = N(O,2), d(Fe) = N(O,3), d(Pe + Gn) = NO, %) 
where 
(24) ; I — qq 
25) $=] —gqq’ —DJ"D’ 
(26) $=] —q/ —DiJI+ J*)'D. 


If for symmetric matrices we write K 2 L whenever K — L is nonnegative 
definite, then 


(27) 





584 HERMAN CHERNOFF AND E. L. LEHMANN 


We digress to present 
Lemma 1. If d(y) = N(O, U) where the characteristic roots of U are iy, r2, 
++ dy, then 


d(y’y) = d(Z di) 


where d(z) = N(0, I). 

Proor. Expressing U in canonical form, we have U = PAP’, where P is 
orthogonal and A is the diagonal matrix whose diagonal elements are the X; . 
Since U is nonnegative definite, the \; are nonnegative and we may define A’ in 
the obvious way. Let d(z) = N(0, J) and y* = PA‘z. Then d(y*) = N(O, U) 
and d(y*’y*) = d(y’y). But 


y*y* a 2’\P’PAt: om s’Az 


and the lemma follows. 

As a consequence of this lemma, it follows that the distributions of R R, and 
R are those of z’Az, 2’Az, and 2/Az where A, A, A are the diagonal matrices of 
characteristic values corresponding to R, R, and R, respectively. From the 
known results on R and R it follows that = has for characteristic roots k — 1 
ones and 1 zero, while 5 has for characteristic roots k — s — 1 ones and s + 1 
zeros. Since © = 3 = §, it follows that 3} has for characteristic roots: k — s — 
1 ones, 1 zero, and s roots \;, Ax, -:* , A, between zero and one. Our Theorem 
1 follows. 

Remark. A direct proof of the above-mentioned properties of the charac- 
teristic roots of = and 5 may be given by showing that qq’ and DJ™ D’ are 
projection operators on orthogonal manifolds of dimensions 1 and s respectively, 
that is, 


(qq’)(qq’') = q(>_ pq = qq 
(DJ D’)(DJ™ D’) = DJ“ IJ" D’! = DJ“ D’ 
(DJ~ D’)(qq’) = 0. 


The roots A, , -:- , A, which determine the distribution of the test criterion 
R can be obtained from 

THeoremM 2. /f uw; = 1 — Aj, then the yw; are the characteristic roots of the de- 
terminantal equation |.J — pJ\ = 0. 

Proor. We shall use the fact that if the vectors t,,--- , & form an ortho- 
normal basis of k-dimensional space, then the matrix >> r,t; has the charac- 
teristic roots 7; , --- , 7». This implies, in particular, that > tt; is the identity 
matrix. 

Given J and J, there exists a nonsingular (s x s) matrix S and a diagonal 
matrix M such that 


Pi = SS’, J ; = SMS’ 





MAXIMUM LIKELIHOOD ESTIMATES 585 


where the diagonal elements of M are the roots of | J~’ — uJ~* | = 0 and hence 
of | J — uJ | = 0, and are all between 0 and 1, since J < J. 
If uw, -+- , u, are the columns of DS, 


DJ“D’ = (DS)(DSY = ¥ uu;. 


Since (DS)'(DS) = S'JS = I and D’q = 0, it follows that g, uw, +--+ , U, are 
mutually orthogonal unit vectors. If we let , --- , %~.-1 be a complementary 
set of orthogonal unit vectors we have 


S = I — qq — DSMS'D = I — ad — Do wins 


t=) 


k—*—1 s 


De vy + (1 — wducni. 
j=l 


i=l 


It follows that the characteristic roots of § consist of k — s — 1 ones, one zero, 
and \; = 1 — yw; fori = 1,---,8. 


4. Some Numerical Examples. By using the maximum likelihood estimates 
based on the full sample, one is operating at a higher significance level than the 
one stated. One can, however, on the basis of the above results, make an ad- 
justment which asymptotically provides the correct value. 

Given 6, let C(@) be such that 


k—s—1 k—1 
Pf Dut L rxOyi x cw} = a. 


t= ‘ 


Clearly C(6) is a continuous function of @ and hence C(6) — C(6) in probability 
as n — ©. It follows that the probability of 


k—s—1 


k—1 
vit 2 A(@)yi = Cb) 


tends to a as n — ©. Here C(6) can be computed, at least in theory, to an 
arbitrary degree of accuracy using the results of Pitman and Robbins [4]. 

Theoretically, the error committed by using the maximum likelihood esti- 
mates based on the full sample without an adjustment can be quite serious in 
the case of a small number of cells. For example, if s = 1 and \(@) is close to 1, 
we have essentially one extra degree of freedom, and when the number k of cells 
is small so that k — 2 = 1, 2 or 3, the actual probability a* of type I error 
would vary from 15 per cent to 10 per cent when the level of significance is 
supposed to be a = 5 per cent. 

In practice, however, at least for fitting a Poisson distribution, the error 
does not appear to be so serious. Some values of \(@) and the true probability 





586 HERMAN CHERNOFF AND E. L. LEHMANN 


of rejection a*(@) in the Poisson case are given below for groupings z = 0, 1, 
22 and z = 0, 1, 2, 2 3, and level of significance supposed to be a = .05. 


0,1,2, 23 


(8) » i .od .14 On 
a*(6) .054 .067 055 .065 


As a second example, consider the fitting of a normal distribution with mean 
¢ and variance o , both unknown. For the case of the four cells (—#, —1), 
(—1, 0), (0, 1), (1, ©) and two combinations of ¢ and o we obtain the following 
values for the two roots \; and de: 


80, = .20 
=» 74, =. 
The probability a* is then given by 
a* = P{U+AV+.W 2 C,} 


where U, V, W are x’ variables with 1 degree of freedom and C, is such that 
P\{U 2 C.} = a. As a lower bound of a* in the first case we have computed 
P{U + 8V = Co} = .12. This indicates that in the normal case the use of 
maximum likelihood estimates in x’ may lead to a more serious underestimate 
of the probability of type J error. 


REFERENCES 

fl] J. Neyman, “Contribution to the theory of the x?-test,’’ Proceedings of the Berkeley 
Symposium on Mathematical Statistics and Probability, University of California 
Press, 1949. 

{2} H. Cramitr, Mathematical Methods of Statistics, Princeton University Press, 1946. 

(3) H. Cuernorr, ‘On the distribution of the likelihood ratio,’’ Ann. Math. Stat., Vol. 
25 (1954), pp. 573-578. 

[4] i. J. G. Prrman anp H. Rossins, ‘Application of the method of mixtures to quad- 
ratic forms in normally correlated variables,’’ Ann. Math. Stat., Vol. 20 (1949), 
pp. 552-560. 





AN EXTENSION OF MASSEY’S DISTRIBUTION OF THE MAXIMUM 
DEVIATION BETWEEN TWO-SAMPLE CUMULATIVE 
STEP FUNCTIONS' 


By Cua Kvue1 Tsao 
Wayne University 


1. Summary and Introduction. Let 1; < 2. <-+-+ < 2, and yy, <y2 <-+* < 
Ym be the ordered results of two random samples from populations having 
continuous cumulative distribution functions F(x) and G(x) respectively. Let 
S,(x) = k/n, where k is the number of observations of X which are less than or 
equal to x, and S,,(x) = j/m, where j is the number of observations of Y which 
are less than or equal to z. 

The statistics 

d, max | S,(z) — Sn(x) |, 


Str 


d, max | S,(z) — S,(x) |,r < min (m, n), 
2S max(zp.yy) 

can be used to test the hypothesis F(x) = G(x). For example, using d, we would 
reject the hypothesis if the observed d, , that is, the maximum absolute devia- 
tion between the two step functions at or below the rth observation of a given 
sample, is significantly large. 

In this paper, the distributions of d, and d,; under the hypothesis F(x) = G(x) 
are obtained and tabulated. Some possible applications are discussed and a 
numerical example in life testing is given. 


2. Distribution of d,. Denote by m, the number of observed values of Y 
which are less than 2, ; by m. the number of values of Y which are between 
a, and 2, --+- ; by m, the number of values of Y which are between z,_; and 
z, ; and by M the number of values of Y which are greater than z, . If the hy- 
pothesis F(x) = G(x) is true, the probability of the occurrence of a set of m, , m2, 

-,m,, M is 


Pr(m, , --- ,m,,M) = (u*"")/(n*"). 


The formula is a special case of the general probability formula (3) in [3]. This 
formula depends only on M, that is, it is independent of m,, m.,---,m,. 
Thus, for any given M, the probability that d, S a can be found by counting 
the number of sets of m, , m2, --- , m, which give values of d, S a. Denote this 


number of sets by K,,y(a), then 7 


m 


Pr(d, S a) = D0 Kyw(a)-(i""")/(n"). 


M=0 


The method of counting K,,4(a) is essentially the same as that of Massey [2]. 
As an illustration, suppose m = n, then S,,(x) and S,,(x) can differ only by 
multiples of 1/m. For any integer c and any given M, K,,4(ce/m) may be counted 


Received 7/20/53. 
! Work supported by the Office of Naval Research. 


587 





588 CHIA KUEI TSAO 


as follows. Let V;;(c),i = 1,2, --- ,r;7 = 1,2, --- , 2c, be the number of pos- 
sible sets of m,, m2, --- , m; such that S,,(2,) = (i + 7 — c — 1)/m and such 
that d; S c/m. Then it is evident that these V;,;(c) satisfy the 2rc difference 
equations: 

k+1 


+= 1,2,---,7, 
Vulc) = 2 Vis) k = 1,2,--- ,2¢, 
where Vo.4:(c) = 1 and Vo,;(c) = O forj # ¢ + 1, and Vy124:(c) = O for 
#=1,2,---,r-—1. 
Since S,,(z,) = (r + 7 — c — 1)/m, we must have m — M =r +j-c-1l, 
orj = m — M —r-+c-+1, for any given M. Consequently we obtain 


K,, u(c/m) _ Ve,m—m—r+e41(C). 
We note that K,,u(c/m) = 0 form — M — r+ c+ 1 outside the range (1, 2c) 


3. Distribution of d; . In testing the hypothesis F(x) = G(x), the critical 
region consists of d, > c,/m, where Pr(d, > c,/m) = a. If, however, r < ca, 
then the criterion d, reduces to, say 

d, = min [S,(z) — S,(z)], 
since, in this case, S,(x) — S;,(z) can never be greater than c,/m. As a result, 
the test becomes a one-sided test in the sense that the null hypothesis will be 
rejected only when too many y’s are less than z;,7 = 1, 2, --- , r. One way of 
always getting a two-sided test is to use the statistic d; which is symmetric 
with respect to the two samples. 

Defining d, as 


d, = max | S, (2) — Six) l, 


2aUr 


we have 


Pr(d, < a) = Pr(d, S a,2, > y,-) + Pr(d, S a, 2, <y) 


= > Kula) (4-9/0) +O Ke nla) 49 /"). 
M=( N= 
If m = n, then 


mT 


Pr(d; < a) = 2 +B K,u(a) (0) / (n"). 


We note that: 

(a) If r = m = n, the distributions of both d,, and d,, reduce to Massey’s 
distribution [2]. 

(b) If r = 1, then d, reduces to a special case of the exceedance problem of 
Gumbel and von Schelling [1]. 

Tables I and II give the probabilities of d, and d;, respectively, for m = n. 


4. Applications. The statistics d, and d, are useful for situations where the 
sample sizes are known, but where the information beyond a certain ordered 
observation, say 2, , is unavailable. In life testing, one often wishes, by drawing 





CUMULATIVE STEP FUNCTIONS 


TABLE I 
Probability of d, S c/m 


4 5 





2 | .50000 .95000 1.00000 


-45714 .81429 =.98571 1.00000 
-28571 .81429  .98571 1.00000 


-43651 .85714 .96032  .99603 1.00000 
25397 .73810 .96032 .99603 1.00000 
15873 .67857 .93651  .99603 1.00000 


42424 .84091 .94372 .98701  .99892 
-23810 .70130 .92857 .98701  .99892 
-13853 .60390  .89827 .98701 .99892 
-08658 .55519 .87229 .97944  .99892 


-41608 .83042 .93240 .97931 .99592  .99971 1.00000 
-22844 .67920 .90793 .97348 .99592  .99971 1.00000 
-12821 .56643 .85897 .97348 .99502  .99971 1.00000 
07459 .4877 -82634 .96300 .99592 .99971 1.00000 
-04662 .44843 .80186 .95280 .99359  .99971 1.00000 


-41026 .82308 .92424 .97319 .99277 .99876  .99992 
-22191 .66434 .89378  .96232 .99068  .99876  .99992 
-12183 .54336 .83204 .95649  .99068  .99876  .99992 
-06838 .45315 .78291 .94367 .99068  .99876  .99992 
-03978 .39021 .75245  .92968  .98718  .99876  .99902 
02486 .35874 .73007 .91880 .98345 .99806  .99992 


-40588 .81765 .91810  .96833 .98992  .99757 .99963  .99008 
-21719 .65362 .88355 .95352 .98548  .90685  .90063  .99008 
-11748 .52756 .81473 .94272 .98322 .99685  .999063  .90908 
-06450 .43149 .75376 .92236 .98322 .99685  .99963  .99908 
-03620 .35985 .70769  .90539  .97869  .99685 99963 = §=©. 99998 
-02106 .30987 .68005 .89058 .97314 .99572 .99963  .99998 
-01316 .28488  .65981 .87982 .96870 .99443 .99042  .99998 


-40248 .81347 .91331 .96440 .98744 .99637 .99921 .999089  .990909 

-21362 .64551 .87580 .94653 .98086 .99466  .00897 .90089  .99009 

-11431 .51602 .80128 .93192 .97610  .99383 .99897 .99989  .90909 

-06183 .41650 .73309  .90525 97378 .99383 .99897 .99989 99999 

-03395 .34065 .67739 .88049 .96836  .99383 .90897 .99089  .99090 
| .01952 .28409 .63587 .86262 .96121 .99228 .99897 .99989  .99999 

-01108 .24464 .61101 .848061 .95464 .99020 .99861  .99989  .90000 
| .00693 .22491 .59283 .83759  .94987 .98849  .99818  .99983 


-39272 .80172 .89062  .95259 97920 §=.99150 .99600  .99808 

| -20383 .62328 .85472 .92635 .96571 .98547 .09447 .99815 
-10611 .48591 .76593 .90199 -95254 .97933 .99203 .90735 
-05544 .38006 .68171 .88897 .94046 .97372 .98990  .99671 
02909 .29843 .60791 81327 92146 96902 .98828  .99632 
-O1534 .23544 «.54425 «= .77033 Ss 90032 §=—.. 96232 Ss. 98728 «=. 99617 
-00814 .18683 .48977 73200 «6.88009 §=.95482 §=.98578 99617 
-00436 .14935 .44362 .69904  .86243 .94774 .98378  .90582 

| 00236 .12055 .40521 67192 .84825 .94159 .98135  .99515 





CHIA KUEI TSAO 


TABLE I (continued) 


5 6 7 


-38808 .79626 .89313 .94674 .97478  .98868 .99519  .99808  .99928  .99975 .99992  .99908 
-19939 .61316 .84525 .91681 -95784 .97989 .99100 .99624 .99854 .90948 .90983  .99995 
-10260 .47286 .75053 .88858  .94070 .97047 .98627 .99410 .99767 .99916 .99973  .99092 
-05289 .36526 66048 .83900 .92417 .96103 .98143 .99186 .99674 .99883 .99963 99990 
-02733 .28268 58122 78559 «=.89854 =. 95196 97672 .98969 .99580  .90853 .99954  .99988 
01414 .21923 =. 51231 -73418 .86907 .93937 .97236  .98773 .99512 .99829 .99948  .99987 
00734 .17043 .45256 .68657 83906 .92458 .96661 -98607 .99453 .99813 .99945  .99986 
00382 .13287 .40080 .64319 .81020 .90909 .96046 .98417 .99414 .99804 .99044 99986 
-00200 .10392 8.35603 =. 60405 78336 =. 89401 -95397 .98198  .99011 -99802 .90044 99986 


.38359 .79103 88687 - 94096 97024 - 98549 - 99316 99688 - 99863 -99942 .99976 99991 
-19520 .60360 . 83640 90768 - 94998 - 97393 - 98693 -99370 .99708 -99870 .90044 99977 
-09940 .46086 - 73634 - 87616 - 92922 -96122 -97963 - 98974 -99504 -99771 .99899 99957 
-05065 .35211 -64144 . 82088 90888 - 94809 -97170 .98525 -99265 .99651 99842 - 99932 
02583 .26922 . 55808 -76114 -87750 93498 96350 -98045 .99001 99515 .99776 - 99902 
-01318 .20600 48576 - 70332 -84110 -91690 -95526 -97550 -98724 .99369 .99705 99870 
-00673 15775 42312 64937 -80340 -89530 - 94486 -97055 -98442 -99220 .99631 99836 
-00344 .12092 34041 -59969 76628 87193 -93244 -96472 -98164 -99072 .99559 99803 
00176 = .09278 32189 55418 73062 - 84800 -91867 - £4868 -97855 -98930 .99490 99773 


- 38139 78851 88382 93811 -96793 -98381 - 99203 99617 -99821 99918 .99963 99984 
-19319 .59901 83218 -90327 - 94607 .97086 .98472 99221 -99614 -99814 .99913 99960 
.09790 .45521 .72965 .87031 -92365 .95655  .97606  .98722 .99339  .99668 .99838 99924 
04962 .34605 .72363 .81244 .90164 .94169 .96659  .98148 .99007 .99484 .99741 99874 
.02517 .25040 54759 .74992 .86770 .92678  .95669  .97524 .98633 .99270 .99623 .99812 
-01277 20022 .47403 . 68948 . 82829 . 90623 94661 . 96869 - 98228 .99031 .99489 -99740 
00648 .15239 .41049 .63311 -78741 .88163 .93389 96198 .97803 .98776 .99342  .99660 
.00329 .11604 35572 §.58144 74735 85523 91899 95439 .97402 98543 .99221 99608 
00167 .08840 . 30821 .53369 - 70817 82735 -90161 94461 -96875 -98234 .90025 99482 


two samples, to detect whether one population is the same as another. If the 
observations become available in order of magnitude, then we can stop the ex- 
periment whenever at least r observations of each sample have occurred and 
reach a decision by the use of d, . Evidently, by doing so, it would be possible, in 
many cases, to reduce both the average time needed and/or the average number 
of items destroyed. 

As an illustration, we give a numerical example as follows. Suppose fuses are 
produced by two different methods. One is interested in detecting whether the 
distribution of the current needed to blow the fuses is the same for fuses produced 
by the two methods. To this end, one then may put on a test, say, 40 fuses pro- 
duced by each method. Suppose one arranges the test in such a way that every 
fuse in the two samples is subjected to the same current so that the weakest 
blows first, then the second weakest, etc. 

Let us choose in advance that r = 6 anda = .05. Let x; < 2, < --- denote 
the ordered observed current needed to blow the fuses in the first sample and 
yi < Yo < +++ that in the second. Suppose that the actual combined outcomes 
ALE LyLoL slay XooLYwWsLo/sXwIut2 --- . Thenthe experiment may be terminated 
when the observation x. has occurred with rejection of the null hypothesis, using 
the statistic d , since for m = n = 40, Table II gives Pr(d, = 9/40) = .04951. 





CUMULATIVE STEP FUNCTIONS 


TABLE II 


40000 .90000 1.00000 


34286 .77143 =. 97143 00000 
-22857 .77143 -97143 - 00000 


-31746 .71429 =. 92063 = .99206 + 1.00000 
-19048 .64286 .92063 .99206 1.00000 
12698 .64286 .92063  .99206 1.00000 


-30303 .68182 .88745 .97403 .99784 1.00000 

17316 .58442 .85714 .97403 .99784 1.00000 
-10390 .52597 .85714 .97403 .99784 1.00000 
-06926 .52597 .85714 .97403 .99784 1.00000 


-29371 .66084 - 86480 - 95862 -99184 .99942 -00000 
-16317 .55070 + .81585 .94697 .99184  .99942 1.00000 
-09324 .47203 .78788  .94607 .99184  .99942 1.00000 
-05594 .42483 .78788  .94697 .99184  .99042 1.00000 
03730 .42483 .78788 .94607 .99184  .99942 1.00000 


-28718 .64615 .84848 94639 .98555 .99751 99984 
-15664 .52867 .78757 .92463 .98135 .99751 99984 
-08702 .44056 .74281 91298 .98135  .99751 -99984 1.00000 
-04973 .37762 .71733 .91298 .98135 .99751 -99984 1.00000 
-02828 .33986 .71733 .912908 .98135  .99751 99984 00000 
-01989 .33986 .71733 .91208  .98135  .99751 - 99984 00000 


-28235 .63529 -83620 .93665 -97984 99515 99926 19996 - 00000 
- 15204 .51312 - 76709 90703 97096 99371 99926 99996 00000 
-08293 .41983 -71181 88544 -96643 = .99371 -99926 99996 1.00000 
-04607 .34986 -67133 -87413 - 96643 99371 99926 -99996 1.00000 
02633 .29988 -64829 87413 -96643 -99371 99926 - 99996 00000 
01680 .26989 -64829 -87413 96643 -99371 99926 -99996 1.00000 
-01053 .26989 64829 87413 96643 -99371 99926 -99996 1.00000 


.27864 .62694 .82663 92879 97487 99274 99842 «8.99978 .99900 1.00000 
-14861 .50155 75161 -89307 -96172 98933 99794 99978 - 99999 .00000 
08002 .40510 .68926 86384 .95220 .98766 99704 .99978  .999909 1.00000 
-04365 .33144 63955 84300 -94755 -98766 . 99794 99978 99999 00000 
-02425 .27620 .60317 .83218  .94755 98766 .99794 99978  .990909 1.00000 
01386 .23674 58248 83218 94755 98766 99794 .99978  .99909 00000 
-00831 .21307 . 58248 . 83218 -94755 - 98766 -99794 99978 - 99999 . 00000 
-00554 .21307 .58248 .83218 94755 .98766 .99704 99978 99999 00000 


-26820 .60345 79923 §=©. 90517 95840 .98318 99380 99795 § .90041 .90085 .99007 1.00000 
-13946 .47069 - 70045 -85270 93142 97094 Q9S8U4 99629 . 99894 -99975 .90005 . 99999 
07276 .36837 63148 . 80397 . 90509 95865 98406 99469 - 99853 .99968 .90005 . 99999 
-O3811 .28943 .56419 .76009  .880093 94744 .97979 .99343 .90826 .90065 .99905 .99909 
-02006 .22850 . 50637 -72137 85079 93805 765! 99264 - 99816 .99965 .99005 . 99999 
-01062 .18145 45702 -68795 - 84224 93096 9745: 99234 - 99816 -90965 .90005 . 99999 
00566 .14516 .41535 .66003 82875 92646 ; 99234 .990816 .90065 .90905 .99909 
-00305 .11725 38087 63798 81978 92454 737! 99234 - 99816 99965 .990905 . 99999 
00166 .09593 35340 62247 81558 92454 7 -99234 99816 .99965 .90005  .99009 





CHIA KUEI TSAO 


TABLE II (continued) 


5 6 7 


-26334 .59252 .78631 -89348 .94956 .97735 .99038 .99616 .99856 . ‘ - 99995 
-13543 .45708 69050 8.83363 91568 95978  .98201 -99248 .99709 .99896 .99966  .99990 
-06977 .35320 60709 =. 77718 -88141 .94003 .97254 .98820 .99533 .99832 .99946  .99985 
03601 .27345 .53471 ° 3 84835 .92205 96287 .98372 .99349  .99766 .99925  .99979 
-01863 .21216 .47193 5 81727 .90303 .95344 .97938 .99161 .99706 .99908  .99976 
00066 =. 16501 -41753 347% 78857 .88706 .94472 .97546 .99024 .99658 .99897  .90074 
-00502 .12871 37041 f 7€243 =. 87180 =. 93701 -97215 .98907 .99626 .99890  .99973 
-00262 .10073 .32068 56128 73899 85841 93054 96958 .98828 .99609 .99888  .99973 
OO137 =. 07914 20454 63039 71837 =.84708 .92546 .96781 -98065 .99603 .99888  .99973 


25870 .58207 77374 88192 94047 97098  .98632 .99376  .99725 99883 .99952 .99081 
-13170 44449 67279 81536 80995 .94787 -97386 .98730 .99415  .99730 .99889  .90054 
-06709 .33966 58509 75233 85844 92244 .95925 .97947 .99009  .99542 .99798  .99015 

| .03420 .25074 50914 69395 81775 .89618  .94341 97050 + .98530 99301 .99684 99864 
-01745 = .19878 44338 64035 77873 -86996 .92701 -96089 .98003 .99030 .99552 99805 

00891 15226 38644 50128 .74172 84434 -91053 95101 97448 98739 .99410 .99740 

00455 = .11673 33711 54641 70685 81962 89430 94110 96885 .98440 .99263 .9967: 
.00233 .08958 29438 50543 67412 .79599 .87856 .93138 .96328 .98144 .99118 .99607 

OO119 §=.06883 25734 - 46801 64349 - 76899 86348 .90360 95789 .97859 .98979 99546 


25645 .57702 76765 87621 93586 -96763 -98407 - 99235 -99641 -99836 .99927 99968 
12904 13853 66436 80654 89214 -94172 - 96944 - 98442 -99228 99628 .99826 99921 
06586 .33341 57487 74061 84730 91310 .95212 -97444 - 98677 99336 .99677 99848 
03339 .25358 49756 67961 80328 . 88338 - 93318 -96297 98014 98969 .99482 99748 
-O1604 19204 43082 5236: 76095 85356 -91337 -95049 97265 -98540 .99247 99625 
. 00860 14686 37319 72071 82419 89321 93739 96455 -98063 .98978 99480 
00436 11184 $2341 56 68267 79564 -87308 -92396 95606 97551 .98685 - 99320 
00222 .08521 2806 | 327 64740 76874 85391 91114 -94804 -97086 .98443 99216 
OO113 =. 06496 24324 375 61307 74161 - 83380 - 89688 - 93852 -96468 .98051 G8964 


In this particular experiment, only 20 per cent of the fuses are destroyed in reach- 
ing a decision. 
It, perhaps, should be remarked that if we define 


dD, max |S,(r) — S,,(x) |, 
TS Inmo—r+is 


D, max | S,(x) — S,(x)|,r S min (m, n) 


FSMIN( Sy 6 41 Vm —r +) 


then the distributions of D, and D’ are identical with those of d, and d, . Thus, in 
a test, if the information below a certain ordered observation is unavailable, or 
if the observations become available in decreasing order, that is, 7, , 2») , *** %1 
and Ym, YJm-1,°** Yi, then D, and D’ would be the appropriate statistics to use. 

In conclusion, I would like to thank Mrs. Dorothy Wolfe who carried out the 
computations of Tables I and IT. 


REFERENCES 
{1] E.J.GumpBec ann H. von ScHeuuina, “The distribution of the number of exceedances,”’ 
Ann. Math. Stat., Vol. 21 (1950), pp. 247-262. 
(2) F. J. Massey, ‘‘The distribution of the maximum deviation between two sample cumu 
lative step functions,’’ Ann. Math. Stat., Vol. 22 (1951), pp. 125-128. 
[3] 8S. 8S. Wicks, “Statistical prediction with special reference to the problem of tolerance 
limits,’’ Ann. Math. Stat., Vol. 13 (1942), pp. 400-409. 





ON DISTRIBUTION-FREE STATISTICS! 
By Z. W. Brrnspaum AND H. RusBIn 
University of Washington and Stanford University 


1. Introduction. Let X,, X.,--:, X, be a sample of a one-dimensional 
random variable X which has the continuous cumulative probability function 
F. It has been observed [1] that, to the authors’ knowledge, all distribution-free 
statistics considered in the past can be written in the form #[F(X,), F(X:), 
--» , F(X,)] where ® is a measurable symmetric function defined on the unit- 
cube {U:0 S U; S$ 1,1 = 1, 2, --- , n}. It is the purpose of this paper to study 
the relationship between the class of statistics which can be written in this 
particular form and the class of distribution-free statistics. 


2. Distribution-free statistics and statistics of structure (d). Let 2 and 0’ be 
two families of cumulative probability functions. A real quantity W = S(X,, 
X2,-+-+: , Xa, G) will be called a statistic in Q with regard to if, for any G e 
Q, F ¢’, and X,, Xz, --- , X, in the n-dimensional sample-space for a random 
variable X which has the cumulative probability function F, 

(i) S(X,, Xe,---, Xa, G) is defined almost everywhere in the sample- 


space X,, X.,--- , X, (i.e. with the possible exception of a set of probability 
zero), and 


(ii) W = S(X,, X2,-+-, Xa, G) has a probability distribution; this prob- 
ability distribution will be denoted by @(W; F) = @[S(Xi,, X2,°--,X.,@; 
F}. 


For example, Kolmogorov’s statistic 


(2.1) D, = sup |F,(x) — G(x) |, 
—wczcw 

where F,, is the empirical cumulative distribution function determined by the 
sample X,, X.,--- , X,, satisfies (i) and (ii) when 2 = Q’ = Q,, the class of 
all nondegenerate cumulative probability functions’, hence D, is a statistic in 
Q with regard to Q; . 

If for a statistic S(X,, X.,--- , X¥,, G) in Q with regard to 0’ there exists a 
function ® defined on the n-dimensional unit cube and symmetric in its argu- 
ments, such that for any G ¢Q, F ¢ 0’ we have 


s 


S(X1, X2,-°+, Xa, @) = SG(X)), G(X), --- , G(X,)] 


Received 8/6/53. 

1 Work done under the sponsorship of the Office of Naval Research. 

? The notations for various classes of cumulative probability functions are those in- 
troduced by Scheffé [2]. 


593 





594 Z. W. BIRNBAUM AND H. RUBIN 


almost everywhere’ in the sample space X,, X,, --- , X, for the random varia- 
ble X which has the cumulative probability function /, then we shall say that 
S(X1, X2, °°: , Xn, G) is a statistic of structure (d). 
Kolmogorov’s statistic (2.1) is an example of a statistic of structure (d), 
since it can be written as 
D, = max < max | G(X) a. cl . — ax) |}, 
fest,++,0 | n 'n ) 
where X;, X:,---, X‘, are the numbers X, , Xo, +++, X,, ordered increas- 
ingly. 

If 2 = © and the statistic S(X,, X.,--- , X,, G@) has the property that the 
probability distribution @[S(X,, X.,--- , X,, @); G] is independent of G for 
G ¢ Q, we shall say that S(X,, X2,--- , X,, G) is a distribution-free statistic 
in Q. 

Let us now assume 2 = ©’ = © , the class of all continuous cumulative prob- 
ability functions. Denoting by F the rectangular distribution in (0, 1) and by 
U,,-:-, U, a sample of size n of a random variable with distribution R we 
have 


0 {(G(X,), --- , G(X); G) = PfH(Ui, «+ , Un); RY. 


It follows that if a statistic in Q with regard to Q, has structure (d) then it is dis- 
tribution-free in Q . 

All distribution-free statistics considered in literature happen to have struc- 
ture (d), with Q = Q’ = Q . Nevertheless the conjecture that every distribution- 
free statistic, symmetric in X,, X2,--:, X,, with = Q’ = &, must have 
structure (d) is not true. This can be seen from the following counter-example. 

Let w, and w, be nonempty, mutually exclusive subsets of 2, such that w, u 
w, = 2. Denoting by F’, again the empirical cumulative distribution function 
determined by a sample of size n, we define 


sup [F(x) — F,(2)] if Few 


—wcrc@ 


sup [F,(x) — F(2)] if F ¢€ we. 


ot 8 


S = 


Since S, and S, are distribution-free statistics with the same probability dis- 
tribution, S is a distribution-free statistic. It is, however, clearly not a statistic 
of structure (d). 


3. Strongly distribution-free statistics. Let 2* be the family of all continuous 
cumulative probability functions such that if G e Q* then G is strictly increasing 
at all x for which 0 < G(x) < 1. Clearly if G ¢ Q* then the inverse function 
G*” is defined on the open unit interval. 


* The exceptional set of probability zero may depend on G 





DISTRIBUTION-FREE STATISTICS 595 


We now consider a statistic S(X,, X2, --- , X,, G) in Q* with regard to some 
family ©’ of cumulative probability functions. This statistic shall be called 
strongly distribution-free in Q* with regard to Q if the probability distribution 
@[S(X,, X2,---, X., @); F] depends only on the function r = FG” for all 
Gem, Fe. 

It is easily seen that, for 2’ = 0*, a strongly distribution-free statistic is dis- 
tribution-free. For if @[S(X1, X2,---, X,, G@); F] depends only on FG” for 
all F, G ¢ O*, then in particular O[S(X, , X2, --- , X., @); G| depends only on 
GG” = I, hence is independent of G. One also verifies immediately that if a 
statistic in Q* with regard to Q* has structure (d) then it is strongly distribution-free, 
since then ®{®(G(X,), G(X), --- , G(X,)]; F} = @{®[U, Us, --- , Un]; FG}. 

Since all practically important distribution-free statistics are symmetric in 
X,, X2,°°- , X, and strongly distribution-free, as well as of structure (d), one 
again may conjecture that under some fairly general assumptions these two 
properties are equivalent. This conjecture is found to be correct for 2 = Q = 
Q*. We have already seen that if a statistic has structure (d) it is strongly dis- 
tribution-free; it remains only to prove the converse statement: 

TuHeoreoM. /f a statistic W = S(X,, X2,°-: ,X,, G) in Q* with regard to Q* 
is symmetric in X,, Xo, +--+ , X, and strongly distribution-free, then it has struc- 
ture (d). 

The proof of this theorem makes use of a lemma which will be presented in 
the next section. 


4. Lemma. Let H be a strictly increasing continuous function on the closed 
unit-interval, such that H(0) = 0, H(1) = 1; ua the measure defined by H on the 
unit-interval I, ; ux” the corresponding product-measure on the n-dimensional 
unit-cube I,,. Then, for any set M Cc I,, with uy” (M) > O and any « > O, there 
exist sets Q,, Qo, +--+ , Q, in I; such that 

(i) Q:, Qe, ++: , Qn are disjoint, pu-measurable, with ua(Qi) > O for i 
ey 


’ 


(ii) for Qo = Compl. U Q; we have un(Qo) > 0; 
i=l 


(iii) if Q; is placed on the y;-axis, i = 1, 2,--- , n, then the product-set Q = 
Qi X Qe KX «+: XQ, in I, has the property 


un” (Qn M)/un?(Q) > 1 — «. 


Proor. It may be assumed without loss of generality that H(y) = y, so that 
uu and uy” are Lebesgue measures. Let C,,y,,....y, denote the cube | Y; — ys | < 
nin the (Y,, Y2,--- , Y,) space, with center (y , ye, --* , yn) and volume uy” 
(Cow,.-*-9,) = (29)”. 

It is well known that 


(4.1) lim (29) "uw" (M 7 Cy iy co-ed) 


7-0 





596 Z. W. BIRNBAUM AND H. RUBIN 


for almost all points in M (see e.g. [3] p. 129). The subset of those points of M 
for which no two coordinates are equal and none is 0 or 1 has the same measure 
as M. Let M; be the set of all points of M for which (4.1) holds and which have 
no two coordinates equal and no coordinate 0 or 1. Then uy”(M;) = ux” (M) > 
0. Let yi, --- , y', be a point in M,, and let 


° ° 0 ° ( . ; 0 0 
\ = min {min y;, min (1 — y;), min | yj — y;}}. 
(1%) (4) izéj 


Clearly 0 < \ < 4, and for 0 < 9» < 4/2 the intervals 
(4.2) Qs: (yi — 2, yt + 0), 


are all in J; and satisfy (i) and (ii). If Q; is placed on the Y-axis then the prod- 
uct-set Q = Qi K Qe X --- K Q, is the cube C,,°.....,°. According to (4.1) 
there exists an ) > 0 such that 


(29) "ua (M n Cyy°,...4°) > 1 — « 
for 7 < mm. Choosing 7 < min (m, A/2) and constructing the intervals (4.2) 


one obtains the Q; required by the lemma. 


5. Proof of theorem. When the random variable X has the cumulative prob- 
ability function F, the random variable Y = G(X) has the cumulative prob- 
ability function H = FG”. Setting Y; = G(X,) we, therefore, have 


W = S(Xi,°°-, Xn, G@) = S[G™(Y)), --- , GO (Y,.), 
and 
@[S(X1, +--+, Xa, G); F] = O{S[G(Y)), --- , EG (Y,), G]; Fa } 
P{SIG™ (Y,), --- , @(Y,.), G; A}. 


By assumption, this last probability distribution depends only on the cumula- 
tive probability function H, and not on G. From this and the symmetry assump- 
tion we wish to conclude that S[G‘”(Y,), --- , G~(Y,), G] can be written in 
the form of a function ®(Y,, --- , Y,), independent of G except on a set of H- 
measure zero. 

To prove this, we assume that for some G, , G, ¢ 2* we have SIG" (Y;), ---, 
Gf" (Y,), Gi) ¥ SIGS" (¥)), ---, GS” (Y,), G:] on a set of positive H-measure. 
Without loss of generality we may assume 


2 >k> S[GD"(Y), --- GP"(Y,), Gl 
— SIGS(Y,), ---, GF°(Y.), G] > » > O 


on a set M in the unit cube /,, where M is symmetric and has positive 
measure. For any H, continuous and strictly increasing in J, , and any e > 0, 





DISTRIBUTION-FREE STATISTICS 597 


we construct sets Q:, Q2,---, Q, according to the lemma in Section 4 and have 


(5.2) us” (Qn M)/us”(Q) > 1 — «. 


For any 


(5.3) a; > 0, i=0,1,---,n, and a+ 7 a; = 1 


i-1 


we define the set function 


> a - pa(T’ n Qs) 
Kay ou(T) = 26 05 0G) 
for any measurable 7 C /, . This clearly is a probability measure in J, . Taking 
for T the interval (0, y) we obtain a strictly increasing continuous cumulative 
probability function which will be denoted by Ka,,...,a, 

Without loss of generality, S may be assumed bounded, since otherwise we 
could consider S/(1 + | S|). This assures the existence of the mathematical 
expectation of S. Since S[G{"(Y:), --- , GI?(Y,), GJ and S[G"(Y)), ---, 
GS’ (Y,), G2] have the same probability distribution if Y,, Y:,---, Y, area 
sample of a random variable Y with the cumulative probability function 
Ka,,---,a, , their mathematical expectations are equal 


. E{S[@y"(¥,), «++ , GP" (Y,), Gi] 
(64) — Sef"(Y,), --- , GF (Y.), Gd; Ke;.:--.0.} 
Using the abbreviations 

SIGS? (V1), «++, GT" (¥), GJ = SAN, +: 


we write the left-hand side of (5.4) explicitly 


1 n 
I [Si(¥4, 7 rw = SY; ee Y,)) II dK. 4, ,.--,0,( Ys) 
¥ n=O v1 


n 


> >a >a | of [Si(¥i, «++ Yn) — S(¥1,--+, Yad] 
rr) in YY 140), YneQj, 


- [] dKa,.....2.(¥) 


t= 1 


n n 
> aes > aj, aj, 


j,=0 in=O un(Qi,) - - un(Qi,) 


“h 


Since S\(Yi,---+ , Ya), Seo(¥:, +--+, Yn) and M are symmetric in Y;,---, Yn, 
all the terms of the sum which correspond to different permutations of the same 
n subscripts ji, -*- , jn (out of the n + 1 possible values 0, 1, --- , n) are equal. 


I (Si(¥i, +++, Yn) — So(¥i,--+, Val dH(Y,) --- dH(Y)). 
@ 


ii in 





598 Z. W. BIRNBAUM AND H,. RUBIN 


Collecting these equal terms, we obtain a polynomial in ap , a , --+ , a, , Which 
according to (5.4) vanishes identically under the restrictions (5.3). It follows 
that each of the integrals in the last term of (5.5) must vanish, and in particular 


/ [ ern [Sy(Y, ? Yo, as Y») — S(¥i, Yo, ae Y,)] dy, Per dY» dy, = 0; 
Qi “Qe Qn 


which, for ¢ sufficiently small, contradicts (5.1) and (5.2). 


REFERENCES 
{1] Z. W. Brrnpaum, “‘Distribution-free tests of fit for continuous distrib ution functions,’’ 
Ann. Math. Stat., Vol. 24 (1953), pp. 1-8. 
(2) H. Scuerrf, “On a measure problem arising in the theory of nonparametric tests,’’ 
Ann. Math. Stat., Vol. 14 (1943), pp. 227-233. 
{3} 8. Saks, ‘“Theory of the integral,’’ Monografie Matematyczne, Vol. 7, Warszawa-Lwow, 


1937. 





A NOTE ON PARTIALLY BALANCED DESIGNS 


By Marvin ZELEN 


National Bureau of Standards 


It is well known [1], [2] that a singular group divisible design containing two 
associate classes can be derived from a balanced incomplete block design by 
replacing each treatment by n treatments. In this paper it is shown that a 
partially balanced design with (m + 1) associate classes can be derived from 
a partially balanced design with m associate classes by replacing each treatment 
by n treatments. 

The definition of a partially balanced incomplete block design with m as- 
sociate classes can briefly be described as an experimental plan 

(i) having v treatments arranged in b blocks such that each block contains k 
experimental units, 

(ii) where each treatment is .eplicated r times and no treatment occurs more 
than once in any block, 

(iii) such that with respect to any treatment ¢, the remaining treatments 
can be divided into m associate classes such that the ith class contains n; treat- 
ments and ¢ occurs in \; blocks with each of the treatments in the ith class 
(¢ = 1,2, ---,m), 

(iv) and if two treatments are kth associates, the number of treatments 
common to the ith associates of one and the jth associates of the other treat- 


ment is p'; (for i, j,k = 1, 2, «+: , m, with pi; = p),), and is independent of 
the particular pair of treatments. 

It has been shown [3] that the following relations hold between the parameters 
of the design. 


bk 


m 
dn, 


i=l 


m 


> nr 


i=1 


m 
Py = 
~~ 


j=l 
(™ i i k 
») Nipie = NYDk = UEYPi- 


The main result of this paper can be stated in the following theorem. 


Received 9/14/53. 





600 MARVIN ZELEN 

Main Theorem. /f, in a partially balanced incomplete block design having m 
associate classes and parameters 
(6) v*, b*, r*, k*, NE, nt, Dar (1,j,k = 1,2,---,m), 


such that? # r*(i = 1, 2, --+ , m), eachtreatment is replaced by n different treat- 
ments, the derived design will be a partially balanced incomplete block design with 
(m + 1) associate classes having parameters 


nv*, b = O*, k = nk*, 
* * 
=X;, ni =nn;, 
Nasri =n — i, 
Pii 
k 
Pk,m+1 
kk 
Pi,m+1 


atl 


n+} * . 
PD ij J, P for 





m+i = 9 
\Pm+1,m+1 oR” & 
Proor. Let ¢ be a treatment in the original design and denote the remaining 
a , , 
treatments by t;” for j = 1, 2, ---,m;;7 = 1, 2, ---, m, where ¢‘” are the 
. . . ( . 
ith associate treatments of t. Denote the treatments replacing ¢ and ¢;” in the 


. . i) (4) ( 
derived design by the row vectors t = (4, ,t2,--+ ,t,) and t{? = ¢§P, tS, --- , tf?) 
‘ 


fy "gl Ae ce) denotes the ith as- 


respectively; then the row vector T; = (t} 
sociates of any treatment element of t. If ¢ and ¢’ are two kth associate treat- 
ments in the original design, and if t, is any treatment element of ¢ and ¢, is 
any treatment element of ¢’, then ¢, and ¢, will be kth associates. With respect 
to each of these treatments, the remaining treatments can be divided into 
(m + 1) associate classes in the following manner: 


ASSOCIATE CLASSES 


Oy °°? p bpua, boss, °° 
, , , 


Oa» *** 5 bent 5 Cott, °** 


Upon replacing each treatment in the original design by n different treatments, 
the new design will have v = nv*, b = b*, k = nk*, r* = r. From the array 
we see that n; = nn? fori = 1, 2, --- , m) and mn, = n — 1. Since any treat- 
ment in the original design occurred in \¥ blocks with each of its ith associates, 
the new treatments will occur in \; = AT blocks with each of their ith associates 
fori = 1, 2, --- , m. Also each treatment will occur in r blocks with each of 
its (m + 1) associates, that is, Amy: = 7. 

Since ¢ and ¢’ have p?, treatments in common which are ith associates (say) 





PARTIALLY BALANCED DESIGNS 601 


of t and jth associates of t’, then ¢, and ¢, will have pi; = np7; treatments in 
common which are ith associates of t, and jth associates of t, for i, j, k = 
1, 2, --- , m. It is readily seen that the number of treatments in common be- 
tween the kth associates of t, and the (m + 1) associates of t, is pi,m4i = (n — 1) 
for k = 1, 2, --- , m, and that Vicwtt = QO for: = kand i,k = 1,2, ---,m. 
Similarly if, with respect to a pair of treatments which are (m +- 1) associates, 
the remaining treatments were put in an array, it can be demonstrated that 
pr = Ofori ~ j, while p7/* = nnf fori = 1,2, --- ,m,and patina =n — 2. 

It is now necessary to show that the relations (1) through (5) are satisfied. 
For (1) we have 


bk = b*nk* = nr*v* = rv. 


For (2) we have 


m+1 


> n =nv* —1)+(n—1) =v-L 


i=] 


For (3) we have 


m+1 


oe nr; = nr*(k* — 1) + (n — Dr = r(k — 1). 


t=1 


For (4) we have 


m+ 


» Pij 


(nn} jA#Akj7 =1,2,-->,mk=1,---,m+1, 


ey - D+ e- tee) j = k;j,k = 1,2,---,m, 
m+ 

» Dims =n-—-l= Nent1 

i= 1 


m+ 


m+i1 ‘ 
p Vieni =i=- 2= Nm+i 
1 


For (5) we have 


_—— = = ii i = nn*p*! i17.k=1.2 
UDig = NUNUYPii =~ NP = NNYiy =~ NP TK NN PR 1,)7,K = 1,4,°°* mM, 


m+1 
Mhm+1 P ij 


1 7 * 8 

= NiPjmi = NsPimi = 0 1,jA#m+1, 
i a o/ l aa al -> 1 9 

Ni Dime = NNN — 1) = Nii @#=1,2,°°*',m., 


The condition that \*¥ ¥ r* arises from the fact that if this condition is not 
true, then with respect to a particular treatment ¢ there will exist a group of 
(say) mth associate treatments which will occur with ¢ in exactly r blocks. 
Since every treatment is replicated r* times, ¢ will always appear with the same 
group of treatments. Thus if a treatment occurs in a certain block, then every 
mth associate treatment will also occur in that block. Therefore it is possible 
to replace each group of treatments bv a single treatment tc derive a new design. 





602 MARVIN ZELEN 


It can be shown, using an argument similar to that used to prove the main 
theorem, that the derived design will be a partially balanced incomplete block 
design with (m — 1) associate classes having parameters 


* 
v=v = Pa 7. k = k*/n,, 


(8) uM =X, =ni/n i 


+ 


k wk <b 
ij Pij/m t, J, % 


\p 
This last result can be summarized in the following theorem. 

TuHeoreM: If in a partially balanced design having m associate classes and 
parameters (6) such that (say) \% = 1, then the treatments can be divided into 
v*/n>, groups so that each treatment occurs in a block with all the treatments of its 
group r* times. Also it is possible to replace each group of treatments by one treat- 
ment to derive a partially balanced design with (m — 1) associate classes and pa- 
rameters given by (8). 

A large number of partially balanced incomplete block designs with two 
associate classes are available [4], [5]. These designs can be used to construct 
three associate class designs having parameters 


y = nv*, b = O*, k nk*, 


* * 
AT = AG, ne = nn 


a 
A3 = ’ m=n— Il, 


*2 
1? 


Q 


*1 *1 *2 
npir wis n— | npi np 


*1 #1 ee #2 *2 
npi2 NP22 NP i2 NpP22 n-—l1}, 


\n — 1 0 n-1 O 


REFERENCES 

. C. Bose anp W. 8S. Connor, “Combinatorial properties of group divisible incom- 
plete block designs,’’ Ann. Math. Stat., Vol. 22 (1952), pp. 367-383. 

. C. Bose anv T. Suimamoro, ‘‘Classification and analysis of partially balanced in- 
complete block designs with two associate classes,’’ J. Amer. Stat. Assn., Vol 
47 (1952), pp. 151-184. 

. C. Bose ann K. R. Narr, “Partially balanced incomplete block designs,’’ Sankhya, 
Vol. 4 (1939), pp. 337-372. 

. C. Boss, 8. 8. SurrkHanpeE anv K. N. Buarracnarya, “On the construction of 
group divisible incomplete block designs,’’ Ann. Math. Stat., Vol. 24 (1953), 
pp. 167-195. 

5. Boss, W. H. CLtatworruy ANnp 8.8. SariKHAnpe, ‘Tables of Partially Balanced 

Designs With T'wo Associate Classes,’? Univ. North Carolina, Mimeo. Series 
77, (1953). 





NOTES 


CORRELATION BETWEEN A DISCRETE AND A CONTINUOUS 
VARIABLE. POINT-BISERIAL CORRELATION 


By Rosert F. Tare 


University of Washington' 


1. Introduction and Summary. A problem of some importance in statistical 
applications, especially in the field of psychology, is that of finding a measure 
of association between a discrete random variable X, which takes the values 
0 and 1, and a continuous random variable Y. The ordinary product-moment 
correlation coefficient p(X, Y) is used for this purpose. It has received the name 
point-biserial correlation coefficient because of its relation to the biserial correla 
tion coefficient proposed by Karl Pearson for a similar problem. The usual 
estimator r, based on a random sample (X;, Y;), 7 = 1, 2, --- , n, is referred to 
as the sample point-biserial correlation coefficient. 

The psychological value of p (and hence of r) is that it affords a measure of 
the degree of association between a trait and a measurable characteristic, usually 
an ability of some kind. For the ith individual in a random sample of n in 
dividuals, X; has the value 1 if the trait is possessed and Y; is a measure of the 
ability in question. 

We shall give in Section 2 the appropriate mathematical model, based on 
normal theory, and the asymptotic distribution of r (Theorem 1), the derivation 
of which is an elementary application of a well known theorem of Cramér. 
An important special case of this distribution will be discussed in Section 3, 
namely that in which X takes the values 0 and 1 with equal probabilities. In 
this connection a variance-stabilizing transformation will be given (Theorem 2). 
Numerical work based on this transformation may be carried out with the use 
of existing tables. In particular, the calculation of confidence limits for p is 
immediate. Theorem 2 is especially useful in investigating the association 
between sex and some other characteristic, since animal populations consist of 
approximately half males and half females. As an illustration of the ease with 
which calculations may be carried out, a problem is considered in which the 
trait is male and the characteristic is IQ. 

The small-sample distribution of r is quite easily found, although it is difficult 


to deal with when n is even moderately large, asymptotic methods appearing 
to be more desirable. This is discussed in Section 4. 


Received 6/29/53, revised 1/26/54. 

‘This research was performed while the author was at the Statistical Laboratory, 
University of California, Berkeley, and was sponsored in part by the Office of Naval 
Research 


603 





604 ROBERT F. TATE 


2. Model and asymptotic distribution. Consider (X;, Y;), i = 1, 2, ---,n 


, , 


a sequence of independent random vectors. Let X ; have the Bernoulli distribution: 
P(X; = 1) = p, P(X; = 0) = q, O0O<p<lptgq=l. 


Let each Y; have the -mixed distribution with distribution function F(y) = 
pF i(y) + qFo(y), where 


; * Sl 2/22 
Fy) = P(Y S y|X =j - | ——_owre oe mf 
i y) = J | dD oo TV le ’ J 
The following notation will be used. The standardized difference of means, 
(41 — wo)/r, will be denoted by A. The variance of the random variable Z will 
be denoted by V(Z). The fact that Z is asymptotically normal with mean a 
and variance b’ will be denoted by Z ~ 9i(a, b’). Of course, 


7 ‘S“(x.Y,) M4 n&V\/V D(X; — RYV DY; — Y)’. 


Throughout the paper the indices for all summations run from 1 to n. Easy 
calculation shows that 


E(X) = p, V(X) = pq, E(Y) = pur + quo, V(Y) = (1 + pqd’), 
E(XY) = pm, (X, ¥) = Av/pg/(1 + pga’). 


‘THEOREM 1. 


eo le 4pq — p'(Gpg — 1)”, _ P|. 


4npq 


As r is a function of sample means which has a total differential at the point 
given by the expectations of these means, this result is obtainable by a lengthy, 
but elementary, calculation from a theorem of Cramér [1]. Moments yp. = 
E\(X — p){Y¥ — E(Y)]"} are needed. In this case 


wan = 7 {pq‘a(d) + (—p)‘gb()}, 


where 
(r) (t + gd)’ ere a b(A) r’ (t A)* a It 
a = # q&) —/ ae al, = ce, pA) Von‘ al. 


Straightforward analysis leads to 'Theorem 1. 


3. A special case. It may easily be shown that the asymptotic variance of r 
has a minimum for each p when p = q = }, since, for each p, 


a i , 1 2 2,2 / l . 
V(r|P) - V(r | 4) =—p(l = o'*( _ 1) = 0. 
n 4pq / 
In this event we obtain the greatest precision (in terms of the smallest con- 


fidence interval for p). This we should expect intuitively because of the obvious 
analogy between our set-up and that of the ordinary two-sample ¢ test, since 





POINT BISERIAL CORRELATION 605 


in the case of that test it is well known that equal sample sizes yield greatest 
power. For the case p = } we have from Theorem 1 


r~ alan @ —p)(l—- “|. 


The parameter p may be removed from the variance by the variance stabilizing 
transformation ¢(x) which satisfies the equation 


¢'(2) = V2/(1 — 2')V2 = x. 


The desired solution is ¢(x) = +/} sgn(x)-sech”*(1 — 2”). The function tanh” (x) 
is given in Table VB of Fisher [2]. Hence it is convenient to express sech™ in 
terms of tanh” and express our final result as follows. 

THEOREM 2. 


sgn(r)-tanh "4/1 — (1 — r*)? ~ O[sgn(p)-tanh“+/1 — (1 — p?)?, 2/n]., 


Examp.e. Calculations associated with Theorem 2 are quite easy. Consider, 
for example, that we are sampling school children of some fixed age. Let X; = 1 
if the ith child is a boy and 0 if the ith child is a girl. Let Y; be the IQ of the 
ith child. Assuming variability of IQ to be the same for the two sexes, we shall 
use the following data for 25 school children in order to determine 95 per cent 
confidence limits for p. 


Boys Girls 


106 143 i09 93 115 105 107 111 
98 114 98 | 119 113 117 111 92 
109 107 96 91 104 89 85 


ll Dy = 2626 Dory = 1174 1.96\/(2/n) = 5542 
dYzi= 11 Doy? = 2797388 =r = +.120 


From Table VIII of Pearson [5], 1 — (1 — r’)’ = .02891. 

From Table VB of Fisher [2], sgn(r)tanh™’+/1 — (1 — r*)? = .1718. 

Therefore, 95 per cent confidence limits for sgn(p)tanh” 4/1 — (1 — p*)? are 
.1718 + .5542. To find confidence limits for p, we use the same tables in reverse 
order to solve the equations 


tanh '+/1 — (1 — p*)? = .7260, and = .3824. 


Taking the positive solution for the first and the negative solution for the sec- 
ond, we obtain (— .263, .465) as 95 per cent confidence limits for p. 


4. The small sample distribution of r. Let T = r\/(n — 2)/(1 — 7°). The 
small sample distribution of r can be obtained easily by making use of a relation 





606 ROBERT F. TATE 


given by Lev [3]. Lev considered not the distribution of 7', but only the con 
ditional distribution of 7’. More precisely, let 2X: =n, andn =n — n. Then 


(4.1) r(no, 1) = VV ‘me (Y, — ¥, / Vf SE wv (Yi; 


i=0 j=l 


where Y, = sa — X)Yi/n, ¥: = (X.Y) In, Y = > Y./n. Expression 
(4.1) denotes the conditional value of r given yx, = n,. It can be shown 
(Lev) that 


/n — 2r(no, n mrp y > 
V1 — r(no, m) n nm — 2 i jm 


i= ) 


Denoting this quantity by t(nm , m), we have (Lev) 


{i — Yo) — (ui — wo) ~ 
t(no,m) = + ———_— (Y,, 
\ rV n/non a r’ a 2) 2 
The random variable t(m , n;) has the noncentral ¢ distribution with n 


degrees of freedom, and parameter of noncentrality 


6 = AV non, n= pV nons/ npq(1 — p*). 


Denoting the density of t(mp , m) by f(t; 1m, n, p, p) and the density of 7 by 
g(t; n, p, p), we have, by the definition of t(no , m), 


Bs \ ne wens 
nN, DP, p) = 2 (") pq’ “f(t; m,n, Pp, p). 


ny=O 


When p = 0, we see that f(t; m , n, p, 0) is independent of n; and p, since 6 (). 
Hence, g(t; n, p, 0) is also independent of p and is the density of the ordinary { 
distribution with n — 2 degrees of freedom. Thus, to test the hypothesis 7:p = 0 
at level of significance a, we reject H when | T| 2 k,, where k, is obtained 
from a table of the ¢ distribution. The power against an alternative of the form 
(p, p) can be computed from the expression 


n ‘ tha 
B(p,p) =1—- > ) wv / S(t; m,n, p, p) dt. 
n 0\ ~ka 


i 


The integrals may be evaluated directly from the tables of Neyman and To 
karska [4] for level of significance a = .05 or a = .O1. They use the symbol p 
for parameter of noncentrality, instead of our 6. 

If a large sample is available, quicker and sufficiently accurate results may 
be obtained from Theorem I. Then the above procedure, which amounts to 
looking up n values in the Neyman-Tokarska table, can be avoided 


REFERENCES 


[1] H. Cramér, Mathematical Methods of Statistics, Princeton University Press, 1946, 
pp. 359, 366-367 
2) R. A. Fisuer, Statistical Methods for Research Workers, Oliver and Boyd (1946), p. 210 





ANALYSIS OF VARIANCE TEST 607 


[3] J. Lev, ‘*The point-biserial coefficient of correlation,’’ Ann. Math. Siat., Vol. 20 (1949), 
pp. 125-126. 

[4] J. NeymMan anv B. Tokarska, “Errors of the second kind in testing ‘Student’s’ hy 
pothesis,’ J. Amer. Stat. Assn., Vol. 31 (1936), pp. 318-326. 

5) K. Pearson, Tables for Statisticians and Biometricians, Part 1, Cambridge University 
Press, 1945, p. 20 


rrr - 


A COMPUTING FORMULA FOR THE POWER OF THE ANALYSIS OF 
VARIANCE TEST 


By W. L. NicHoLson 


University of Oregon' and University of Illinois* 


1. Summary. A formula for the power of the analysis of variance test is 
derived for the case when the denominator of the F ratio has an even number 
of degrees of freedom. The form employed is particularly adapted to computa- 
tion of the power as a function of the alternative hypothesis with arbitrary 
fixed level of significance and fixed degrees of freedom. For m degrees of freedom 
in the numerator and 2, 4, 6, 8 and 10 in the denominator, the power functions 
are deduced from the general formula, with an indication of their use. 


2. The power function. In the classical analysis of variance test we are in- 
terested in a ratio of the form 


(1) 


where x; (7 1,2,---,m) and y; (j = 1, 2,--- , n) are distributed N(@,, o°) 
and N(0, o°), respectively. If the null hypothesis, 6; = 0 (¢ = 1, 2, +--+, m), is 


false, it is well known that the distribution of F is completely specified by m, n 
and the single additional parameter 


, 


(2) A= 2G. 


ao” tat 


Therefore, for a predetermined level of significance a, the power of the test is a 
funetion of m, n, and X. It is [1] 


«© 


k 
(3) P(d\a,b;a) =1— De 2 I,{a + k, b), 


k=O 


where m = 2a and n = 2b, and 


| eT at+k—1 et | 
(4 I(a+k,b) = —— ——- l (1 —t dt 
) a - i) Bla + &, b) | ) 


teceived 2/24/52, revised 2/12/53. 
1 Research under contract with the Office of Naval Research, U. 8. Navy 
2 Research under contract with the Office of Ordnance Research, U. 8S. Army 





608 W. L. NICHOLSON 


is the incomplete beta function [2] with z = z(a, b, 1 — a) defined as 
(5) I,{a,b) = 1— a. 
Limiting the argument to integral values of b, it follows that 


gi teticy b—j-1 


(6 Ia +k,b) =TMat+b+k — 2) 

7 Pe ae Me +b +O 2 pe 
As in [1] we obtain 
P( | a, b; a) 


(7) e (hus) * ‘I(a + b)a*"* (1 — x)” _s 


(k+1— * 
1 Tat+tk+ Dro — geet! batk+1, 


where 


G 2? 
(8) F(a, B, z) = ee a 
j=0 B® J* 


Here, for any number d, the symbol d is defined by 


qd” = : (d + i) 
1 


Interchange the order of summation, utilize the identity 


j= j= € 


(9) r (—1)" (4 c j ry x jon (1 ro 2), CH = ’ 


and note that P(0 | a, b; a) = a, then (7) reduces to 
P(a|a, b;a) = 1 -@ *“"{(1 — a) 
bos 8 b—j-1 
(10) ‘ fy a2) | - (-(?~ 4 — ~ 1) th. — 1) @- =0 2 |, 
ji(b—j7—A)!IZ!L to a+jt+k J 
For fixed m, n and a the power may be rapidly calculated as a function of \. 


3. An error term. The identity (9) may be replaced by 


(11) fo) 411 — 2)! = Sher 


joo ATT 


I,(d, c). 


z 


Then a similar argument as before gives from (7) 


(12) P(A| a,b; a) = 1 —e*™ ae ay I(a + k,b — k). 


ken) 





ANALYSIS OF VARIANCE TEST 609 


Since J,(a + k, b — k) is a nonincreasing function of k (k = 0, 1, 2, --- ), neg 
lecting the last b — r terms in the general formula (10) will result in an error of 


b—1 \1k 
(13) R(v| a,b, 73a) < I,(a + 7,6 — nero” > ee ay ‘ 
er * 


The finite sum of Poisson terms may be evaluated with the aid of the incom- 
plete T-function, 


1 wor, _ 
(14) ['(u, p) = cramer | t?e‘ dt, 
) (u, p) lp +1) 4 e ¢ 


which is tabulated [4]. Maximizing (13) with respect to \ gives as a uniform 
bound for the error 


(15) R(A\a,b,r;a) < I(a+r,b — 1) (T(L/VW7,r — 1) — T(L/Vb,b — I), 
where L = [[]jan (6 — js)". 
4. Special cases. For b = 1, x* = (1 — a), so (10) simplifies to 
(16) P(|a,1;a@) = 1 — (1 — a)exp{— dl — (1 — a)*}}. 
For b = 2, 3, 4 and 5 the power functions are 
P(d | a, 2;a) = 1 —e€*"” [(11 — a) + (1 — Z)Al, 
P(\ | a, 3; a) = 1 — 
{1 — a) + 2°71 — 2)f(a + 2) — (a + Lar + 40°71 — 2)*r’}, 
,4;a) = 1 — “(1 — a) 
+ 42°"(1 — x)[(a + 3)(a + 2) — 2(a + 8)(a + Lz 
+ (a + 2)(a + 1)2°JA + 40°71 — 2)"[(a + 3) — (a + 2)a)r’ 
+ dor°* 1 — 2x)*r*}, 
P(\|a,5;a) = 1 — e*"* {1 — a) 
+ de’ "(1 — 2)[(a + 4)(a + 3)(a + 2) 3(a + 4)(a + 3)(a + I1)x 
+ 3(a + 4)(a + 2)(a + 1)a” — (a + 3)(a + 2)(a + 1)a°r 
) + 42°"(1 — z)I(a + 4)(a + 3) — 2(a + 4) (a + 2)a 


+ (a + 3)(a + 2)a°Jr? + 4a" (1 — 2)*[(a + 4) — (a + 8)2)r° 


Values of the parameter x = x(a, b, 1 — a) corresponding toa = 0.50, 0.25, 
0.10, 0.05, 0.025, 0.01 and 0.005 are tabled [3]. Other values may be interpolated 
from [2]. 





610 W. J. DIXON 


EXAMPLE |. The power function for a = 0.05, m = 2 and n 8 would be 
obtained from (19) with a 1. From [3], 2(1, 4, 0.95) = 0.52713; substituting 
gives the power function as 


P(X l, 4: 0.05) 
(21) 


0,A7 


ym 4 *7(). 95000 + 0.34381 4 + 0.03961 \° + 0.00136 x°*). 


EXAMPLE 2. Suppose that two-figure accuracy is desired in calculating the 
power function for a 0.05, m = 8 and n 30. The unabridged form of (10) 
witha = 4 and bh 15 would entail evaluating 15 terms. From (15), 


R(\| 4, 15, 8; 0.05) < 0.008. 


Thus using the first eight terms of (10) would certainly secure the necessary 
accuracy 


REFERENCES 


{1] P. C. Tana, ‘‘The power function of the analysis of variance test with tables and il 
lustrations of their use,’’ Statistical Research Memoirs, Vol. 2 (1938), p. 143. 

[2] Kari Pearson, T'ables of the Incomplete Beta Function, Cambridge University Press, 
1934. 

[3] Caruerine M. Tuompson, ‘Tables of percentage points of the incomplete beta-func 
tion,’’ Biometrika, Vol. 32 (1941), pp. 151-153. 

[4] Kari Pearson, Tables of the Incomplete T-Function, His Majesty’s Stationery Office, 


London, 1922 


I 


POWER UNDER NORMALITY OF SEVERAL NONPARAMETRIC TESTS 


By W. J. Drxon 
University of Oregon' 

1. Summary. Presented are tabulations of the power and power efficiency 
of four nonparametric tests (rank-sum, maximum deviation, median, and total 
number of runs) for the difference in means of two samples drawn from normal 
populations with equal variance. The cases considered are for equal sample 
sizes of three, four and five observations and alternatives 6 = | yy, us |/o. 


2. Introduction. One method of comparison of various nonparametric tests 
is a study of their performance under the assumption of normality. An ad- 
vantage of this method is the wide use of the normal assumption. Disadvantages 
are the limitation to a particular type of distribution and the extensive computa- 
tion necessary. 

The computation of power under normality is simplest for small samples and 
small levels of -significance. This fact has guided the present study, but it is 


Received 6/8/53, revised 2/6/54. 
' Research sponsored in part by the Office of Naval Research 





POWER OF NONPARAMETRIC TESTS 


TABLE I 


Power and power efficiency for two samples of equal size, N; = Nz = N, from normal popula- 
tions with equal variances o* and means py; and yw» , for each of four tests against the alternative 
6 = | ui — we |/e. 


= rank sum D = mazimum absolute deviation 
median R = total number of runs 


4 5 
W,D,M,R D M 
1/10 1/35 1/126 2/126 4/126 10/126 26/126 


Pow Eff Pow Eff. Pow. Eff. Pow. Eff. Pow. Eff. Pow Eff. Pow. Eff 


100 .975 .029 .965 .008 .957, .016 .962 .032 .965 .080 . 206 
-lll 98  .035 | .96 | .O11 | .96 

143 98 | .055 .96) .021  .95 | .040 .97 | .072  .97 .144 

195 98 090 .96 .041 .95 

264 98 14! 96 .074 | .95 | .128 .96 .210 


347 97 209 4695); .124, .94 

.438 .97 . 293 : .192 .94 .o0l 
97 388 ; ae 93 
97 .489 i 377 .93 .530 
96 682. .587 | .92. .744 
6 830 |. .768 «91 889 
95 923 .890 90 .961 
95 970. .956  .89  .989 


95 990. 985 88 998 
94 997. .995 . .88 


’ 


hoped that the considerable differences evident in these small sample cases 
along with what is known about asymptotic results, will be of help to the statis- 
tician using nonparametric tests. 

In addition to reporting the actual power for various alternatives, comparison 
has been made with the t-test by use of a power efficiency function (Table 1). 
This function P,(é) is defined as the ratio of the sample size of t-test which 
results in equal power for a given alternative, 5, to the sample size of the non- 
parametric test under consideration. Fractional sample sizes for the t-test are 
found by interpolation on sample size to obtain a power equal to that of the 
nonparametric test. This function has already been used [1] for the sign test. 
Powers of the t-test used for this comparison were computed as by Nicholson [3]. 

Since the power efficiency of a nonparametric test will, in general, also de- 
pend upon the level of significance, comparisons of different tests are made 
difficult by the fact that the levels of significance which naturally occur for 
each test are not the same. To make the comparison simpler, power efficiency 
is also given (Table IT) for the tests randomized to a single level of significance, 
a = .025. For example, the rank sum test has natural levels a = .0159 and 





W. J. DIXON 


TABLE II 
Power and power efficiency of three tests, each randomized to level of significance 
a = .025, forN,; = Nz = 5 


Test Rank sum Max. abs. dev. Median 
Power Eff. Power Eff. Power 


.025 .025 .025 
me 1 .051 81 .045 
173 ‘ .135 ‘ AZ 
.376 ’ . 284 oat . 240 
614 | ; .476 ; .421 
.810 ‘ .668 .620 
.925 : .819 .76 .788 
976 | a 915 “¥ .899 
.994 88 966 | ov .960 
.999 .988 ; .986 


ao 


~ 
a 
— 


os 
~I 
an 
oS 


noe 
r © 


@ b> 
a) 


ew 

o Co 

JI si -J 
WwWwhds bd = 


~ > 
no 


~ 


a = .0317, so that use of the former with chance .425 and the latter with chance 
.575 will result in an effective a = .025. 

Since for 6 = 0 all the power curves agree in ordinate and slope, the limiting 
power efficiency function as 6 approaches zero may be obtained by interpolat- 
ing among the second derivatives of the power functions of ¢ in the same manner 
as among the ordinates for 6 not zero. 

Computation for the limiting power efficiency for the one sided rank sum 
test randomized to a = .0125 for Ny = Nz = 5 was made by interpolating among 
the first derivatives of the power functions for one-sided t-tests. The same 
power efficiency, .964, was obtained as for the corresponding two-sided test 
with a = .025. 

Table III gives these limiting power efficiencies for the rank sum test for 
sample sizes N, S Nz S 5. The cases indicated by an asterisk apply also to the 
maximum deviation, median, total numbers of runs tests. Of course they apply 
also to any test which has, for the stated a, a critical region corresponding to 


TABLE III 
Limiting power efficiency of rank sum test W against the aliernative | 4: — ue \|/o = 5 = 0, 
for various levels of significance a and various sample sizes N, and Nz from normal populations 
with equal variances o* and means yw; and ys. Values marked with asterisk (*) apply also to 
mazimum absolute deviation (D), median (M), total number of runs (R), and similar tests. 


| } 
Ni, Na 2,2 2,3 | 2,4 .9 4,4 5,5 ©, 


1/3 1/5 | 2/15 4/15 | 1/10 1/5 1/70 | 1/126 2/126 4/128 0<a< 1 
.995*, .900 | .971° .966  .975* .973 | .965*| .957* .962  .965 9549 = 3/x 





POWER OF NONPARAMETRIC TESTS 613 


the case of all observations in one sample greater than all observations in the 
second sample, or vice versa. The limiting value, 3/7, for large samples is given 
by others [2], [5]. 


3. Theory. The computation of power requires the evaluation, numerically 
in most cases, of the integrals representing the probabilities that various sample 
configurations lying in the critical region will occur under various alternative 
assumptions. Two such expressions will be displayed. 

The first case corresponds to all observations in one sample greater than all 
observations in the second sample, or vice versa. Here a = 2(N!)*/(2N)! and 
the rank sum statistic equals }°! i for two samples of size N. The power P(6) 
for normal edf F(z) is 


N [ ” P-\(2) PM — a) dF(a) + N [ * PM 2) PMG = 2) df(z). 


In the second case, where the smallest observation and only this smallest, 
observation of one sample lies between the two largest observations of the other 
sample, the rank sum statistic equals 1 + ri for two samples of size N. In 
this case the power P(é) for normal cdf F(z) is 


N°(N — 1) | Py — 8) FY*(—2) F(x — 8) dF(y) dF (a) 


-N*(N — 1) | / F**(y) F(y — 6) FX""(6 — x) dF(y) dF(z) 


+ n’ | FP’ (2) F*-"(6 — x) F(a — 8) dF (x) — n’/ F*~'(—2) F*(a — 6) dF(z). 


Similar expressions may be written down for larger a, and for the other tests. 
The quadratures were performed for 6 = .25(.25)2.00, 3.00, 4.00 for N = 3, 4 
and for 6 = .50(.50)3.50, 4.50 for N = 5; other values were filled in by sub- 
tabulation. The power curves of the nonparametric tests and the t-test, when 
subjected to the transformation 


z(é) 
P(é) = / (Qn) dt, 


yield an x(6) essentially linear in 6 when 6 is not close to zero. Consequently, all 
interpolations were performed on 2(6). This procedure was also used in inter- 
polation for the power efficiency function. Second differences were adequate in 
most cases. 

An extensive bibliography on the above tests is given by Savage [4]. 


4. Conclusions. 

TaBLe I. The four nonparametric tests considered have high power efficiencies 
for very small samples and small a, when compared with the t-test for normal 
alternatives. Power efficiency decreases slightly for more distant alternatives. 





614 HERBERT ROBBINS 


As the level of significance increases, the power efficiency of the rank sum test 
increases slightly whereas the power efficiencies of the median and maximum 
deviation tests decrease. 

TasiLe II. When tests for samples of size 5 are randomized to the single 
level of significance a = .025, it is easy to compare the tests and note that the 
rank sum test has greater power than the median and maximum deviation 
tests. Particularly for near alternatives, the maximum deviation test has greater 
power than the median test. 

Tasie III. The local power efficiencies for the rank sum test are very high. 
For all cases computed they are greater than 3/7, the limiting local power 
efficiency for large samples. 

REFERENCES 

[1] W. J. Drxon, ‘‘Power functions of the sign test and power efficiency for normal alterna- 
tives,’’ Ann. Math. Stat., Vol. 24 (1953), pp. 467-473. 

{2} A. M. Moon, ‘‘On the asymptotic efficiency of certain nonparametric two-sample tests,” 
Ann. Math. Stat., Vol. 25 (1954), pp. 514-522. 

[3] W. L. Nicnouson, ‘A computing formula for the power of the analysis of variance 
test,’’ Ann. Math. Stat., Vol. 25 (1954), pp. 607-610. 

[4] I. R. Savaae, “Bibliography of nonparametric statistics and related topics,’’ J. Amer. 
Stat, Assn., Vol. 48 (1953), pp. 844-906. 

[5] H. R. vAN per Vaart, “Some remarks on the power of Wilcoxon’s test for the problem 
of two-samples, I and II,’”’ Indagationes Math., Vol. 12 (1950), pp. 146-172. 


Addendum 


Papers on this topic appearing since submission of this paper include 
(6) E. L. Lenmann, “The power of rank tests,’’ Ann. Math. Stat., Vol. 24 (1953), pp. 23-43. 
[7] B. L. vAN DER WarERDEN, ‘‘Order tests for the two sample problem,’’ Nederl. Akad 
Wetensch. Proc. Ser. A., Vol. 56 (1953), pp. 303-316. 


rr 


A REMARK ON THE JOINT DISTRIBUTION OF CUMULATIVE SUMS 


By Hersert RoBBINs 
University of North Carolina and Columbia University 


Let X,,k = 1, +--+, , be any finite number n of independent random vari- 
ables with respective distribution functions F,(a) = Pr{X, Ss zz}. Let T, = 
X, +--+ + X;, be the successive cumulative sums of the X; , with individual 
distribution functions G,(t) = Pr{T, s ¢) and joint distribution function 
G(it,, +--+, t,) = Pr{7i S th,---, T,. S t,|. Since the 7; are not in general 
stochastically independent, the function G(t,,--- , ¢,) will not in general be 
equal to the product of the n functions G;(t,), but we shall show that the in- 
equality 


(1) Gt, -++ tr) 2 [1 G4) 
1 


Received 9/12/53. 





CUMULATIVE SUMS 615 


always holds. This is intuitively more or less obvious, but the proof is not en- 
tirely trivial. 

LemMa 1 (Tchebycheff’). Let X be a random variable and let u(x) and v(x) 
be any two nonincreasing functions of x for which Eu(X) and Ev(X) are finite; 
then 


(2) Elu(X)o(X)] = Eu(X)-Ev(X). 


Lemma 2. Using the notation of the first paragraph, let A, B denote any partition 
of the set of integers 1, --+ , n into two disjoint subsets; then 
Pr{T. S t& forallk = 1,---,n] 
(3) 
= Pr{7, s t, for all k ¢ A]-Pr{T, Ss t for all k ¢ B). 


Proor. Induction on n. The theorem is trivially true for n = 1, since one 
of the sets A, B must be empty. If it is true for n — 1, then for any fixed x S 4,, 


Pr{7, < t& forallk = 1,---,n|X, = 2] = PriX.+--- +X, 
St — xforallk = 2,---,n] 
= Pr[X.+ --- +X. —zforallke A — {1}]-Pr[X.+ --- + X, 
<& — zforallke B — {1}] 

= Pr{T, St. forallke A — {1} | X, = a]-Pr{7, 
S t forallke B — {1} |X = 
= Pr{7,, S & forallk e A | X; = 2]-Pr{T7, 

St. forallke B\| X,; = a}. 


The inequality between the first and last members of (4) remains valid even for 
x > t,, since then the first member and one of the two factors of the last mem- 
ber is 0. Thus, setting 
u(x) = Pr{T,. S t for allk e A| X, = gl, 
v(z) = Pr{i7. St. forallke B| X, = gz, 
we have for all x Pr{7, < & forall k = 1,---,n| Xi = 2] 2 u(x)-v(z). Inte- 


grating from —« to © with respect to the distribution function F(x) of X,, 
we obtain 


(6) PriT, S & forallk = 1,---,n| 2 | u(x)v(x) dF y(x). 


‘See Hardy, Littlewood, and Pélya, Inequalities, Cambridge (1934), p. 43; also M. 
Biernacki, “Sur une inégalité entre les intégrales due A Tchebycheff,” Annales Univ. 
Mariae Curie-Sklodowska, Lublin, Sect. A, Vol. 5 (1951), pp. 23-29. 





616 ABSTRACTS 


It is clear from the definitions (5) that both u(x) and v(x) are bounded and 
nonincreasing functions of x, and hence by (6) and (2) 


we 


Pri, Ss 7, forallk = 1,---,nj 2 / u(z) aP,(2)- [ v(x) dF (x) 


which proves (3). 
Tueorem. Jf A;,---, A, form a partition of the set of integers 1, --- 
into any number r of disjoint subsets, then 


Pr{7, Ss & forallk =1,-+-- ,nj2= Il Priv. < t for all k ¢ Ajj. 


j=l - 


In particular, setting r = n, A; = {j},j = 1,-°--, m, (1) holds. 
Proor. Induction on r, using Lemma 2. 


0 Re a 


ABSTRACTS OF PAPERS 


(Abstracts of papers presented at the Pasadena meeting of the Institute, 


June 18-19, 1954) 


1. The Integral of a Symmetric Unimodal Function over a Symmetric Convex 
Set and Some Probability Inequalities. T. W. ANprerson, Columbia Uni- 
versity and Stanford University. 


The integral over an interval of fixed length of a symmetric unimodal function is max- 
imized if the interval is centered at the origin; in fact, the value of the integral is a 
nonincreasing function of the distance of the midpoint of the interval from the 
origin. A generalization of this result to n-space is the following: Theorem 1. Let E be a 
convex set in n-space, symmetric about the origin. Let f(x) = 0 be a function such that f(x) = 
S(—2), |x| f(z) 2 uj is convex for every u(0 S u S @), and SeS(x) da < o (in the Les- 
besgue sense). Then Se f(a + ky) dzx2 Sef(x + y) dz for0 S k S$ 1. A direct consequence 
is that the distribution of X + Y is more spread out than the distribution of X. Theorem 
2. Let X be a random vector with density f(x) satisfying the conditions of Theorem 1; let Y 
be an independent random vector; and let E be a convex set, symmetric about the origin. Then 
Pr {X +kY eh} 2 Pr {|X + Ye FE} for0 S k S 1. Inequalities are derived for distribu- 
tions of functions of random variables such as 2X{ and max i:<:<n | X; | and correspond 
ing functionals of stochastic processes. Another application is to show that certain tests 
of location parameters are unbiased. (Work supported by the Office of Naval Research.) 


2. The Spectral Method of Hypothesis Testing Concerning Continuous Gaus- 
sian Stationary Random Processes. R. C. Davis, Hughes Tool Company. 


Present rigorous methods of hypothesis testing concerning Gaussian stationary random 
functions depending upon a continucus parameter—in which a process is observed only 
during a finite time interval of duration 7—have been based upon an analysis carried out 
in the time domain. In order to determine the sample decision function by this method 
for testing even a simple hypothesis against a single alternative, it is necessary to solve 





ABSTRACTS 617 


two homogeneous integral equations of the second kind. On the other hand, by using Cra- 
mér’s representation of a continuous stationary process (possessing a continuous covari- 
ance function) and obtaining by convolution the resultant spectrum obtained in finite 
observation time, the likelihood ratio can be obtained explicitly without solving any in- 
tegral equations. Moreover, the spectral method is useful in making approximate calcula- 
tions when one is limited by practical considerations to the obtaining of “information” 
only in a finite pass band of frequencies. 


3. Note on the Distribution of a Definite Quadratic Form. James PAcHArss, 
Naval Air Missile Test Center, Point Mugu. 


An expression is derived for the distribution of a definite quadratic form in independ- 
ent N(0, 1) variates which depends only on the value of the determinant of the form and 
on the moments of a quadratic form whose matrix is the inverse of the original quadratic 
form. This expression is an alternating series which converges absolutely and is such that 
if we stop with any even power of the series we have an upper bound and if we stop with 
any odd power of the series a lower bound to the cumulative distribution function. If 
Qn = Ll jar acsxex; , QR = 4 Vij: aij aia; , where x , +++ , z, are independent N(0, 1) 
variates, aj; > 0,1,7 = 1,°*+ ,n, and F,(t) = Pr (Q, S t), then F,(t) = t”/?| A |~#/2 Zin 
(—t)*E(Q*)*/k! T(k + 1 + n/2), where | A | is the determinant of the matrix (a;;), and 
E(Q%)* is the kth moment of Q%. 


4. On Simultaneous Analysis of Variance Test. (By Title.) K. V. Ramacuan- 
DRAN, University of North Carolina. 


In situtations involving the testing of the significance of k mean squares the usual 
method of analysis gives tests which are not independent. Instead of combining these 
hypotheses and using an F test, one would prefer to make simultaneous decisions regard- 
ing the hypotheses. In these situations a simultaneous analysis of variance test has been 
proposed by M. N. Ghosh. In this paper we have shown that the power of this test is a 
monotonic function of the deviation parameters. If s| is the mean square corresponding 
to the hypothesis H; (i = 1,2, +++ , K) then in the case when all si’s are based on the same 
number of degrees of freedom (d. f.) n, it is shown how the simultaneous test could be 
carried out using the distribution of the Studentized largest chi-square u, = {max (sj , 
83, °** , 8&)]/s9 where so is an independent chi-square variable with m d. f. The exact dis- 
tribution of u, has been obtained and upper 5 per cent points tabulated. In the general 
case when n; is the number of d. f. associated with 8 , certain recurrence relations have 
been obtained to facilitate the computation of the percentage points. 


5. The Optimum Character of a Certain Wald Sequential Test. J. V. Breax- 
WELL, North American Aviation, Inc. Introduced by T. E. Harris. 


A recent paper by the author in the Journal of the Operations Research Society of 
America contains formulas and graphs with the aid of which the parameters of a Wald- 
type sequential test for the fraction of defectives may be chosen so as to minimize the 
maximum risk, the loss functions and the cost of test being assumed to have certain linear 
forms. This ‘“‘optimization’’ was carried out only with the aid of certain approximations, 
valid for large average sample numbers, equivalent to regarding the random walk asso- 
ciated with the sequential test as a continuous curve. 

The present paper demonstrates that the first variations of both acceptance-prob- 
ability and average sample number corresponding to arbitrary (small) deformations of 
the straight boundaries of the continuous random walk are identical with those corre- 
sponding to parallel shifts of these boundaries by certain weighted average deformations. 
The paper proceeds to demonstrate that the first variation in maximum risk is zero when 





618 ABSTRACTS 


the boundaries undergo arbitrary deformations accompanied by an appropriate tilting of 
both boundaries to insure that the maximum risk associated with rejecting a good product 
balances that due to accepting a poor product. 

This result is a strong indication that the optimum Wald-type sequential test is opti- 
mum among all possible sequential tests in the light of the assumed risk function and to 
the extent that the continuous random walk is a valid approximation. 


6. An Optimum Decision Procedure for Ranking Means of Normal Populations. 
(By Title.) K. C. Seat, University of North Carolina. 


Among the infinite class of decision procedures (as suggested in ‘“‘On a property of a 
class of decision procedures for ranking means of normal populations’”’ (abstract), Eastern 
Regional meeting, Gainesville, Florida, March, 1954) it is shown that the optimum rule 
may be taken to correspond to c; = 1/n (i = 1,+++ , n). This rule has the following desir- 
able property: If among (n + 1) given normal populations n populations are identical 
with N(é, o*) and if the remaining population is N(é + 4, a?) then the probability of (i) 
either retaining the most desirable population N(& + 6, 0?) when 0 < 6 < @ in the selected 
group, (ii) or rejecting the most undesirable population N(é + 6, 0?) when —»” < 6 < 0 
from the group is approximately maximum for the above rule. (This research was sup- 
ported in part by the United States Air Force under Contract AF 18(600)-83.) 


7. On the Central Limit Theorem for d,, Dependent Variables. (By Title.) P. H. 
DIANANDA, University of North Carolina and University of Malaya. 


Let (A){Xn4}(@ = 1,-+++ , on jn = 1, 2,°++ ) be a double sequence of random variables 
(r. v.’s). If ther. v.’8 (Xana ,*** , Xn.a) and (Xnw,°** , Xn.»,) are independent whenever 
b — a> d,, then the sequence (A) is said to be d,-dependent. Let S, = Xn. + °° 4 
Knives 8, = E(S%), 8 = E(Xna tees + Xn.) and d, = d, + 1. Suppose that (A) is a 
d,-dependent sequence of r. v.’s with zero means and finite variances such that as n — ~ 
(1) lim inf (8./8.+/d, > Oand ’2) for every fixed ¢ > 0, (1/s%,)D/% Stet>eenl VR a dF,.3(z) — 
0, where F,,,;(z) is the distribution function of X,,; (¢ = 1,°*: , m3 = 1,2,°*- ). Then 
Sn/8n is asymptotically distributed as a standardized normal variable. This result im 
proves the main result of an earlier paper, ‘‘On the central limit theorem for m-dependent 
variables,’”’ (abstract to appear in Ann. Math. Stat.) in three directions: (i) bounded vari- 
ances are replaced by finite variances, (ii) m-dependent r. v.’s are replaced by d,-depend- 
ent r. v.’s and (iii) the sequence X, , X2 ,--- , is replaced by the sequence (A). In proving 
this theorem and its vector analogue the methods of the previous paper are used in modi 
fied forms. In the case of independent r. v.’s (d, = 0) (1) is automatically satisfied and (2) 
reduces to the Lindeberg condition. The main theorem is thus an extension of Lindeberg’s 
theorem. 


8. Estimation of a Selection Function. (Preliminary Report.) Dovuaias G. 
CHAPMAN, University of Washington. 


Let X, Y be random variables with een density functions f(x), r~y(x)f(x) re 


spectively where 0 S g(z) S landr = }*, o(2x)f(x) dz. o(z) is known as a “‘selection func- 
tion’’. Numerous studies have been made about ‘‘selected’”’ distributions where the selec- 
tion function is unknown and reduces to the characteristic function of a semi-infinite 
interval (‘‘truncation’”’ on the right or left) or where the selection function is determined 
by the experimenter. In many situations the selection function is unknown and the statis 
tician must obtain point or interval estimates of g from n + m independent observations 
(tj *** Dn Yr *** Ym) Of (X, Y). Large sample procedures are derived for such estimates 
under several assumptions as to the functional form of ¢ and of f. The estimation of r, 
the degree of selection is also studied. Consideration is also given to the problem of esti- 
mating the selection function for grouped data. 





ABSTRACTS 619 


9. On the Power of a Distribution-free One-Sided Test of Fit Against Sto- 
chastically Comparable Alternatives. (Preliminary Report.) Z. W. Birn- 
BAUM AND Ernest M. Scuever, University of Washington. 


Let the hypothesis H and the alternative G be continuous cumulative distribution 
functions, and F, the empirical distribution function corresponding to a sample of size 
n of a random variable with distribution G. It is known that, if H = G, the statistic Di = 
sup |H(s) — F,,(s)} has a probability distribution independent of H. In the present paper 


*) 
a sharp lower bound is obtained for the power of a test of fit based on Df, for the class 
of alternatives G such ‘hat sup {H(s) — G(s)} = 6 is a given positive number and H(s) 2 


(2) 
G(s) for all s. 


10. A Statistical Method of Determining Relationships between Test Speci- 
fication Limits and Performance Specification Limits. Bert D. Levenson, 
Hughes Aircraft Company. 


In many production processes the problem arises of determining whether or not a par 
ticular parameter of a number of similar devices conforms to a predetermined performance 
specification. When sufficiently accurate measurement techniques are available, perform- 
ance characteristics may be determined quite readily. However, when sizable errors are 
introduced by measurement techniques, the determination of performance characteristics 
obviously becomes more complex. In many of these instances the measured values of the 
parameters are classified in accordance with test specification limits with the objective 
of properly classifying the actual values of the parameters relative to performance specifi- 
cation limits. It is the purpose of this article to explore the nature of the relationships 
which arise as a result of this type of test procedure. The discussion is confined to the 
problem of measurement of a single parameter. A three dimensional geometric analog 
has been devised as an aid in visualizing the general as well as specific solutions to the 
problem. Observations have been made based upon specific calculations for some special 
cases with normal distributions. It is shown that in many instances it may be highly de- 
sirable to establish test specification limits outside of performance specification limits. 


11. A Method of Specification, Testing, and Evaluation of Missile Systems. 
E. J. Aurnaus, 8. C. Morrison, anp W. R. Tate, Hughes Aircraft 
Company. 


In tests of complex systems, test errors may be of the order of magnitude of the toler- 
ances on the parameter being measured. Such large errors not only make uncertain the 
quality of accepted product, but also cause inefficiency through rejection of good product 
in test. 

A procedure is discussed for minimizing both these adverse effects. Rejection of good 
product in test is reduced by the use of relaxed test tolerances, while quality is assured by 
tight adjustment tolerances and by quality control analysis of test data. Specification in 
statistical terms and advance estimation of test errors are considered part of a planned 
production scheme. 

Characteristic aspects of the program are the following: (1) Production procedure is 
set up on the basis of statistical predictions of error distributions. (2) Test error (includ 
ing component instability) is quantitatively made a part of quality control. 


12. Discovery Sampling. James R. Crawrorp, Lockheed Aircraft Corporation. 


A new method of acceptance sampling by use of the range of the truncated range of a 
small sample is developed. The purpose is to take advantage of the increased information 
afforded by variables inspection yet minimize the knowledge of mathematics needed by 





620 NEWS AND NOTICES 


the inspector. The technique under investigation is that of adding and subtracting the 
value of the sample range, or a portion thereof, to its upper and lower values, respectively. 
If the sum and difference are still within the prescribed tolerances the lot is accepted. 
Plans of this type are found to yield approximately the same results as attribute sampling 


plans requiring twice the sample size. The inspector need only know addition and sub- 
traction for their use. 


13. Evaluation of Quality through Demerit Rating System. Harry G. Romie, 
International Telemeter Corporation. 


Where inspection and tests are made for a series of specified requirements, these results 
must be properly analysed to obtain maximum benefits. Such requirements cover charac- 
teristics and other features, termed Inspection Items. When such Items are inspected by 
the Method of Attributes it is economical and efficient to classify them with respect to 
their importance or seriousness into definite classes, such as Critical, Major, Minor and 
Incidental. Various classifications, such as three-fold, four-fold, and five-fold, with their 
assigned Demerit Weights are discussed and the mathematical relations pertaining to their 
use are developed. Various uses of these systems in evaluating the quality of different 
processes, products and activities are presented. The nature of these distributions depicted 
by the multinomial and approximations thereto are described. It is shown how to use 
Demerits, Demerits-per-Unit and Indexes for single components, subassemblies, assemblies 
and Systems, as well as shops and composite plants for evaluating performance quality- 
wise. Various weighting systems are introduced and evaluated. Procedures for setting up 
control charts with prescribed limits are given. Finally it is shown how to combine variables 
results with attributes data to obtain over-all quality ratings for any desired sequence of 
operations. 


14. On Structural Fatigue under Random Loading. Jounn W. Mies, Depart- 
ment of Engineering, UCLA, and Douglas Aircraft Company. 


Experience has shown that the fluctuating loads induced by a jet may cause fatigue 
failure of aircraft structural components. In order to throw some light on this and similar 
problems, the stress spectrum and the “equivalent fatigue stress” of an elastic structure 
subjected to random loading are studied. The analysis is simplified by assuming the struc- 
ture to have only a single degree of freedom and by using the concept of cumulative damage, 
the results being expressed in terms of quantities that can be directly measured. As an 
example, a similarity expression for the probable value of the equivalent fatigue stress of a 
panel subjected to jet buffeting is derived. 


re 


NEWS AND NOTICES 


Readers are invited to submit to the Secretary of the Institute news items of interest 


Personal Items 


P. C. Clark has been appointed Executive Vice-President of Hunter Spring 
Company. 

Dr. Edward P. Coleman, formerly Visiting Professor, has been appointed 
Professor in the Department of Engineering, University of California at Los 
Angeles. 

Dr. R. N. Bradt of Stanford University has been appointed to an assistant 
professorship at the University of Kansas. 





NEWS AND NOTICES 621 


Nolan A. Curry has accepted an assignment in Northern Ireland for a period of 
approximately two years. During this period his duties will be to serve as a 
Technical Consultant on Coated Abrasives manufacturing both for the Belfast 
plant of Behr-Manning Ltd., Northern Ireland and the Paris plant of Behr- 
Manning de France. Both companies are subsidiaries of Norton Behr-Manning 
Overseas which handles the overseas operations of both the Norton Company 
of Worcester, Massachusetts and Behr-Manning Corporation of Troy, New York. 

John Curtiss is now associated with the Institute of Mathematical Sciences, 
New York in the capacity of a Senior Scientist. 

Richard De Lancie has accepted a position as Senior Associate, Broadview 
Research and Development, Burlingame, California. 

Arthur N. Doi is now employed with the rank of lecturer at the College of the 
City of New York, Division of Teacher Education doing statistical work of 
descriptive and analytical nature. 

James R. Duffett has accepted a position as analytical statistician in the Guided 
Missile Reliability Program at White Sands Proving Ground. 

Dr. Geroge L. Edgett, who has been a visiting professor in statistics at Virginia 
Polytechnic Institute since the first of January returned at the end of August to 
his duties at Queen’s University, Kingston, Ontario. 

Dr. A. V. Feigenbaum is now Manager, Quality Control and Cost Reduction, 
General Electric Company, Manufacturing Services Division, Schenectady, 
New York. 

Lt. Albert A. Folop, U.S. Navy, after two years of graduate study in statistics 
at Princeton University, has been transferred to the U.8.S. Mississippi (EAG 
128) where he is the Combat Information Center Officer. 

Professor E. J. Gumbel is teaching Mathematical Statistics in the Free Uni- 
versity of Berlin during the summer term 1954 under the exchange program of 
Columbia University. 

Dr. E. Cuyler Hammond has been appointed Professor of Biometry in the 
Graduate School at Yale University. He will continue in his present position as 
Director of the Statistical Research Section of the American Cancer Society. 

Harry H. Harman, has resigned from the Department of the Army where he 
has been Chief of the Statistical Research and Analysis Section, Personnel Re- 
search Branch, A.G.O. for the past six years and has accepted a position with 
the RAND Corporation in Santa Monica, California, where he is engaged in a 
Systems Training Program for the Air Defense Command. 

H. 8. Houthakker has left the Cowles Commission for Research in Economics 
to join the Department of Economics at Stanford University as an Acting As- 
sociate Professor. 

Chester H. McCall, Jr. is an instructor in Statistics at The George Washington 
University while working on his Ph.D. in Mathematical Statistics. 

Craig A. Magwire is now employed as Appliea Science Representative for 
International Business Machines. 

Albert Mindlin, formerly mathematical statistician at the Bureau of the Cen- 





6§22 NEWS AND NOTICES 


sus, has transferred to the Social Security Administration, Bureau of Old-Age 
and Survivors Insurance, Baltimore, Maryland. 

J. E. Morton is on leave of absence from Cornell University to the National 
Science Foundation, where his principal duties will be to plan and direct a 
survey of scientific research undertaken by American Industry. 

Jack Moshman, formerly a Senior Statistician, Mathematics Panel of the Oak 
Ridge National Laboratory, has accepted a position as a member of the Tech- 
nical Staff of the Bell Telephone Laboratories, Murray Hill, New Jersey. 

Aditya Prakash, formerly with Prudential Insurance Company, is now at 
Equitable Life Assurance Society, New York. 

F.§. Riordan, Jr. is Technical Supervisor of Quality Control, The Chemstrand 
Corporation. The Chemstrand Corporation has the only integrated nylon plant 
in the world. 

John Roseboom, formerly statistician with Quality Control, Hq. Air Material 
Command, has joined the Operations Research Office of Johns Hopkins University. 

Ernest M. Scheuer has returned to his position as mathematician at the U. 8. 
Naval Ordnance Test Station, Pasadena Annex. 

C. H. Springer has accepted a position as Chief Engineer of the H. E. Morse 
Company of Holland, Michigan. The company is engaged in the manufacture 
of industrial gaging equipment, plating equipment and the comparoscope, a 
device for the optical evaluation of surface finishes. 

R. M. Sundrum has resigned his position as Research Associate in the Uni- 
versity of North Carolina to return to his position as Lecturer in the University 
of Rangoon. 


K. F. Thomson, formerly with Richardson, Bellows, Henry and Company, 
has accepted a position as section chief of Statistics at PRB, AGO. 


a — — 


George W. Snedecor Award in Statistics 


The Statistical Laboratory of Iowa State College has been authorized to es- 
tablish ‘““The George W. Snedecor Award in Statistics,” to be given annually to 
its most outstanding candidate for the Ph.D. degree in statistics. The award will 
comprise a year’s paid membership in the Institute of Mathematical Statistics 
and a subscription to its Annals. The first recipient, chosen by vote of the gradu- 
ate faculty in statistics, is Miss Helen Bozivich, half-time associate who passed 
her preliminary examinations last fall. 


ame 


Educational Testing Service 
The Educational Testing Service is offering for 1955-56 its eighth series of 
research fellowships in psychometrics leading to the Ph.D. degree at Princeton 
University. Open to men who are acceptable to the Graduate School of the 





NEWS AND NOTICES 623 


University, the two fellowships each carry a stipend of $2,500 a year and are 
normally renewable. 

Fellows will be engaged in part-time research in the general area of psycho- 
logical measurement at the offices of the Educational Testing Service and will, 
in addition, carry a normal program of studies in the Graduate School. Compe- 
tence in mathematics and psychology is a prerequisite for obtaining these fellow- 
ships. The closing date for completing applications is January 13, 1955. Infor- 
mation and application blanks will be available about November Ist and may 
be obtained from: Director of Psychometric Fellowship Program, Educational 
Testing Service, 20 Nassau Street, Princeton, New Jersey. 


TT 


Doctoral Dissertations in Statistics, 1933 


Listed below are the doctorates conferred during the year 1953 in the United 
States and Canada for which the dissertations were written on topics in statistics 
or related fields. The university, month in which degree was conferred, major 
subject, minor subject, and the total of the dissertation are given in each case if 
available. 

O. P. Aggarwal, Stanford, June, major in statistics, “Bayes and Minimax 
Procedures in Sampling from Finite Populations.” 

V. L. Anderson, Iowa State College, June, major in statistics, ‘‘A Model for 
the Study of Quantitative Inheritance.” 

F. C. Andrews, California, (Berkeley), September, major in mathematical 
statistics, ‘Asymptotic Behavior of Some Rank Tests for Analysis of Variance.”’ 

J. R. Blum, California, (Berkeley), June, major in mathematical statistics, 
“Strong Consistency of Stochastic Approximation Methods.” 

L. D. Calvin, North Carolina, major in experimental statistics, ‘‘Doubly 
Balanced Incomplete Block Designs for Experiments in which the Treatment 
Effects are Correlated.” 

C. L. Chiang, California (Berkeley), major in mathematical statistics, “On 
Regular Best Asymptotically Normal Estimates with an Application to a 
Stochastic Process.” 

K. G. Clemans, Oregon, June, major in mathematical statistics, ‘Limiting 
Distributions of Certain Statistics of the Kolmogorov-Smirnov Type.” 

L. J. Cote, Columbia, major in mathematical statistics, “On Fluctuations of 
Sums of Random Variables.” 

R. B. Dawson, Jr., Harvard, major in mathematics, ‘‘Unbiased Tests, Un- 
biased Estimators, and Randomized Similar Regions.” 

H. P. Edmundson, California (Los Angeles), June, major in statistics, “Sta- 
tistical Estimation of Matrix Quantities by Means of a Class of Discrete Markov 
Chains.” 

R. E. Fagen, Minnesota, December, “Certain Probability Limit Theorems and 


, 


Transformations of Stochastic Processes.” 





624 NEWS AND NOTICES 


J. E. Flanagan, Illinois, June, minor in philosophy, ‘Topics in Information 
Theory.” 

Dean Gillette, California (Berkeley), June, ‘‘Representable Infinite Games.” 

J. F. Hannan, North Carolina, June, major in mathematical statistics, ‘‘Asymp- 
totic Solutions of Compound Decision Problems.” 

W. C. Hoffman, California (Los Angeles), August, major in mathematical 
statistics, ‘‘A Statistic Associated with the Joint Distribution of N Successive 
Amplitudes.” 

T. W. Horner, North Carolina, major in experimental statistics, ‘“Non-Allelic 
Gene Interaction and the Interpretation of Quantitative Genetic Data.” 

D. G. Horvitz, lowa State College, June, major in statistics, ‘“Ratio Method 
of Estimation in Sample Surveys.” 

W. W. Hoy, Ohio State, June, ‘“The Estimation of Parameters in the Ornstein- 
Uhlenbeck Process.”’ 

J. P. Hoyt, George Washington, May, major in mathematical statistics, 
“Estimates and Asymptotic Distributions of Certain Statistics in Information 
Theory.” 

D. V. Huntsberger, Lowa State College, June, major in statistics, “An Exten- 
sion of Preliminary Tests of Significance Permitting Control of Disturbances in 
Statistical Inferences.”’ 

M. C. Johnson, Minnesota, major in educational psychology, minor in sta- 
tistics, ‘‘Classification by Multivariate Analysis with Objectives of Minimizing 
Risk, Minimizing Maximum Risk, and Minimizing Probability of Misclassifi- 
cation.” 

R. M. Kozelka, Harvard, February, “On some Special Order Statistics from 
the Multinomial Distribution.” 

J. Laderman, Columbia, major in mathematical statistics, “On Statistical 
Decision Functions for Selecting One of k-populations.”’ 

G. J. Lieberman, Stanford, June, major in statistics, minor in industrial 
engineering, ‘Contributions to Sampling Inspection.” 

R. F. Link, Princeton, October, “Statistical Techniques Useful for Estimating 
the Mean Life of a Radioactive Source.” 

C. A. Magwire, Stanford, major in statistics, ‘Sequential Decisions Involving 
the Choice of Experiments.” 

O. B. Moan, Purdue, May, minor in economics, ‘“The Simultaneous Distribu- 
tion of the Mean and Range in Small Samples.” 

C. B. Moore, Kentucky, June, “On Regression for a Compound Bivariate 
Surface.” 

P. B. Moranda, Ohio State, June, ‘‘Estimation of Parameters of the Ornstein- 
Uhlenbeck and Related Stochastic Processes.” 

G. W. Morganthaler, Chicago, June, ‘“The Central Limit Theorem for Ortho- 
normal Systems—the Walsh Functions.” 

8. 8. Moy, Michigan, February, ‘Applications of Conditional Expectation.” 

J. Pachares, North Carolina, August, major in mathematical statistics, “On 
the Distribution of Quadratic Forms.” 





NEWS AND NOTICES 625 


E. Parzen, California (Berkeley), June, major in mathematics, ‘On Uniform 
Convergence of Families of Sequences of Randon Variables.” 

J. Putter, California (Berkeley), June, major in mathematical statistics, ““Con- 
tributions to Sampling Theory and Nonparametric Hypothesis Testing.” 

R. W. Royston, Michigan, major in mathematics, “A Frequency Function 
which can be Transformed into a Gamma Type Function by a Quadratic Trans- 
formation of the Variable.”’ 

Anne E. Scheerer, Pennsylvania, June, ‘Brownian Motion and the Green’s 
Function—the Plane Case.” 

Rosedith Sitgreaves, Columbia, March, major in mathematical statistics, 
“Contributions to the Problem of Classification.” 

P. N. Somerville, North Carolina, June, major in mathematical statistics, 
“Some Problems of Optimum Sampling.” 

F. L. Spitzer, Michigan, February, ‘On the Theory of Stochastic Processes 
which Appear in the Description of Two Dimensional Brownian Motion by Polar 
Coordinates.” 

M. Taback, Johns Hopkins, June, major in biostatistics, “Family Structure 
and Its Changing Pattern.” 

D. Teichroew, North Carolina, major in experimental statistics, “‘Distribution 
Sampling with High Speed Computers.” 

L. H. Wegner, Jr., Oregon, major in mathematical statistics, “Contributions 
to the Several Sample Problem.” 

L. Weiss, Columbia, January, major in mathematical statistics, “On the Use 
of Moments in Approximating Distribution Functions and Expectations.” 


D. H. Wright, George Washington, June, major in mathematical statistics, 
“Survival Probability.” 


F. M. Wright, Northwestern, August, “On the Backward Extension of Moment 
Sequences.” 

C. Zippin, Johns Hopkins, June, major in biostatistics, “Evaluation of the 
Removal Method of Estimating Animal Populations.” 


or 


Preliminary Actuarial Examinations Prize Awards 


The winners of the prize awards offered by the Society of Actuaries to the 
nine undergraduates ranking highest on the score of Part 2 of the 1954 Pre- 
liminary Actuarial Examination are as follows: 

First Prize of $200 

I, Bask cinicians awe ....... Swarthmore College 
Additional Prizes of $160 

Bowers, Newton L.. ...... Yale University 

Croteau, Robert...... esa .. University of Montreal 

Driscoll, Francis T.. ee Yale University 

Fike, Charles T. Sa ... University of the South 

Freeman, David N. Yale University 





626 NEWS AND NOTICES 


Huff, Robert W. College of Wooster 
Reinken, Donald L. . Princeton University 
Shapland, Robert... j ....... Drake University 
Strang, William G. ........ Massachusetts Institute of Tech- 
nology 
The Society of Actuaries has authorized a similar set of nine prizes for the 
1955 examinations on Part 2. 
The Preliminary Actuarial Examinations consist of the following three exami- 
nations: 
art 1. Language Aptitude Examination. 
(Reading comprehension, meaning of words and word relationships, anto- 
nyms, and verbal reasoning.) 
Part 2. General Mathematics Examination. 
(Algebra, trigonometry, coordinate geometry, differential and integral 
calculus. 


Part 3. Special Mathematics Examination. 


(Finite differences, probability and statistics. ) 


The 1955 Preliminary Actuarial Examinations will be prepared by the Educa- 
tional Testing Service. and will be administered by the Society of Actuaries at 
centers throughout the United States and Canada on May 11, 1955 (tentative 
date). The closing date for applications is March 15, 1955. 

Detailed information concerning the Examinations can be obtained from: 

The Society of Actuaries 
208 South LaSalle Street 
Chicago 4, Illinois 


RR 


New Members 


The following persons have been elected to membership in the Institute 
February 10, 1954 to May 13, 1954 


Arnold, Lester G., M.A. (Univ. of Michigan), Design Engineer, Eastman Kodak Company 
(Navy Ord.), 73 Gatewood Avenue, Rochester 11, New York. 

Bearman, Jacob E., Ph.D. (Univ. of Minnesota), Assistant Professor, School of Public 
Health, University of Minnesota, Minneapolis 14, Minnesota. 

Beyer, William H., M.S. (Virginia Polytechnic Inst.), Statistician, Aerophysics Depart 
ment, Goodyear Aircraft Corporation, Akron, Ohio, 488 Euclid Avenue, Akron, Ohio 

Bojarsky, Sol Melville, B.S. (Louisiana State Univ.), Mathematician, Freeport Sulphur 
Company, Grand Ecaille, Port Sulphur, Louisiana. 

Court, Arnold, M.S. (Univ. of Washington), Student, Statisticul Laboratory, University 
of California, Berkeley 4, California. 

David, Herbert T., M.A. (Columbia Univ.), Research Associate, Statistical Research Center, 
University of Chicago, Chicago 37, [llinois. 

Dion, Louis G., 8.B. (Massachusetts Inst. of Tech.), Quality Control Engineer, Corning 
Glass Works, Danville, Kentucky. 

Doi, Arthur N., M.A. (Univ. of Minnesota), Quality Control Statistician, Aeronautical 





NEWS AND NOTICES 627 


Division, Minneapolis-Honeywell Regulator Company, Minneapolis, Minnesota, 
4100 Sheridan Avenue, S., Minneapolis 10, Minnesota. 

du Mas, Frank M., Ph.D. (Univ. of Texas), Assistant Professor of Psychology 
of Psychology, Michigan State College, East Lansing, Michigan. 

Edmundson, Harold Parkins, Ph.D. (Univ. of California at Los Angeles), Mathematician, 
Department of Defense, 205 N. Abingdon Street, Arlington 8, Virginia. 

Fryer, William D., B.S. (Carnegie Inst. of Tech.), Assistant Physicist, Cornell Aeronautical 
Laboratory, Inc., Buffalo, New York, 27 Linda Drive, Buffalo 25, New York. 

Gates, Charles Edgar, M.S. (North Carolina State College), Assistant Statistician, De 
partment of Experimental Statistics, North Carolina State College, Raleigh, North 
Carolina. 

Ghosh, Manindra nath, D.Phil. (Univ. of Calcutta), Visiting Professor, Department of 
Biostatistics, School of Public Health, University of North Carolina, Chapel Hill, 
North Carolina. 

Giese, Wanda Williamson (Mrs. Robert W.), B.S. (Univ. of Wisconsin), Research As- 
sistant and Statistician, Organoleptic Section, Research Department, Oscar Mayer 
and Company, Madison, Wisconsin, 1127 East Gorham Street, Madison 3, Wisconsin. 

Glass, Stanley Owen, B.S. (Univ. of Wyoming), Student and part time Instructor, Sta- 
tistics, University of Wyoming, Laramie, Wyoming, 1114 Harney, Laramie, Wyoming. 

Khan, Muhammad Khalid Hayat, M.A. (Panjab Univ., Lahore), Statistical Officer, Punjab 
Health Directorate, Civil Secretariat, Lahore, West Pakistan. 

Kingsley, Edward H., M.S. (Northwestern Univ.), Student, Department of Mathematical 
Statistics, University of North Carolina, Chapel Hill, North Carolina, 639 N.. Columbia 
Street, R.F.D. 2, Box 1, Chapel Hill, North Carolina. 

McCarty, Robert C., B.A. (San Jose State College), Student, Mathematical Statistics, 
University of Washington, Seattle, Washington, Apt. B-109, 801 Spring Street, Seattle 
4, Washington. 

Mundle, Peter B., B.S. (Univ. of Oregon), Student, Mathematics and Statistics, University 
of Oregon, Eugene, Oregon, 1975 Harris, Eugene, Oregon. 

Muth, John Fraser, B.8. (Washington Univ.), Student, Department of Economics, Carnegie 
Institute of Technology, Pittsburgh 13, Pennsylvania, 5050 Forbes Street, Pittsburgh 
13, Pennsylvania. 

Neter, John, Ph.D. (Columbia Univ.), Assistant Professor of Business Statistics, Syracuse 
University, Syracuse University, Syracuse 10, New York. 

Reiter, Stanley, M.A. (Univ. of Chicago), Research Associate, Applied Mathematics and 
Statistical Laboratory, Stanford University, Stanford, California 

Riche, Charles V. Jr., M.A. (Univ. of Louisville), Student and Extension Instructor, 
Department of Extension Classes, University of Washington, Seattle, Washington, 
11739 41st N.E., Seattle 55, Washington. 

Rosenbaum, Joseph, B.A. (Univ. of California), Research Associate, Department of Bio 
statistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, 
Pennsylvania. 

Silverman, Robert, B.S. (Ohio State Univ.), Student and Assistant Instructor, Statistics 
Laboratory, Mathematics Department, Ohio State University, Columbus 10, Ohio, 
Box 3132 University Station, Columbus 10, Ohio. 

Smith, Walter L., Ph.D. (Univ. of Cambridge), Statistician to the Medical School, Uni 
versity of Cambridge, Department of Human Ecology, Fenners, Cambridge, England 

Stoller, David S., Ph.D. (Univ. of California at Los Angeles), Research Engineer, Aircraft 
Division, The RAND Corporation, 1700 Main Street, Santa Monica, California, 3400 
Mountain View Avenue, Los Angeles 34, California 

Suzuki, George, Ph.D. (Univ. of Minnesota), Statistical Specialist, Applied Mathematics 
Laboratory, David Taylor Model Basin, Washington 7, D. C., 7830 Lakewood Drive, 
Falls Church, Virginia 


‘, Department 





628 NEWS AND NOTICES 


Taeuber, Richard Conrad, B.A. (Swarthmore College), Student and Research Assistant, 
Institute for Research in the Social Sciences, University of North Carolina, Chapel 
Hill, North Carolina, 4222 Sheridan Street, Hyattsville, Maryland. 

Yost, Earl K., Jr., M.S. (Univ. of Oregon), Operations Analyst, Operations Analysis, HQ- 
SAC, Offutt AFB, Omaha, Nebraska, 4057 Frederick Street, Omaha 5, Nebraska. 


rr 


REPORT OF THE PASADENA MEETING OF THE INSTITUTE 


The sixty-first meeting of the Institute of Mathematical Statistics was held 
in Pasadena, California, on June 18-19, 1954. The meeting was held in conjunc- 
tion with meetings of the Biometric Society, Econometric Society, and American 
Society for Quality Control. There was one joint session with each of these 
societies. A special invited address was given by Dr. E. Cuyler Hammond, 
American Cancer Society and Yale University, on The Problem of Establishing 
Cause and Effect Relationships in the Etiology of Chronic Diseases. A beer party 
and a tea were held. A total of 143 persons attended including the following 55 
membe: of the institute: 


I. J. Abrams, T. W. Anderson, K. J. Arrow, Allan Birnbaum, Z. W. Birnbaum, Charles 
Boll, A. H. Bowker, G. G. den Broeder, Jr., Bernice Brown, Douglas Chapman, K. G. 
Clemans, E. L. Crow, G. B. Dantzig, R. C. Davis, W. L. Deemer, W. J. Dixon, Robert 
Dorfman, Mary Elveback, E. A. Fay, Evelyn Fix, Martin Fox, R. 8. Gardner, G. E. Ghorm- 
ley, E. J. Gilbert, W. K. Green, W. C. Guenther, E. C. Hammond, T. E. Harris, W. C. 
Hoffman, J. M. Howell, Martin Krakowski, C. A. Magwire, A. W. Marshall, O. B. Moan, 
A. M. Mood, R. A. Moor, L. E. Moses, Mervin Muller, 8. W. Nash, John Norton, James 
Pachares, W. W. Page, G. J. Resnikoff, R. L. Rogers, H. G. Romig, Herman Rubin, M. M. 
Sandomire, Henry Scheffé, D. 8. Stoller, D. Teichroew, Elizabeth Vaughan, L. H. Wegner, 
Oscar Wesler, Bryan Wilkinson, R. K. Zeigler. 


The program of the Institute meeting was as follows: 


FRIDAY, JUNE 18, 1954 
9:30 A.M. Programming Models and Their Solutions. 


Joint session with the Econometric Society. 

Chairman: George B. Dantzig, The RAND Corporation. 

(1) A Solution of the Traveling Salesman Problem. Ray Fulkerson, The RAND Corpora- 
tion. 

Panel: Ideas for Solving Large-scale Linear Programming Models. 

Discussior.: George B. Dantzig, The RAND Corporation, Robert Dorfman, University 
of California, Alan Manne, The RAND Corporation, William Orchard-Hays, The 
RAND Corporation. 


2:00 P.M. Session on Experimental Design. 


Chairman: T. E. Harris, The RAND Corporation. 

(1) Construction of Optimal Invariant Sequential Decision Procedures. M. A. Girshick and 
H. Rubin, Stanford University. 

(2) Some Models of Sequential Design. R. Bradt, University of Kansas and 8S. Karlin, 
California Institute of Technology. 





NEWS AND NOTICES 


4:00 P.M. Contributed Papers. 


Chairman: David Stoller, The RAND Corporation. 

(1) The Integral of a Symmetric Unimodal Function over a Symmetric Convex Set and Some 
Probability Inequalities. T. W. Anderson, Columbia University and Stanford Uni- 
versity. 

(2) The Spectral Method of Hypothesis Testing Concerning Continuous Gaussian Stationary 
Random Processes. R. C. Davis, Hughes Tool Company. 

(3) Note on the Distribution of a Definite Quadratic Form. James Pachares, Naval Air 
Missile Test Center, Point Mugu. 

On Simultaneous Analysis of Variance Test. (By title.) K. V. Ramachandran, 
University of North Carolina. 
The Optimum Character of a Certain Wald Sequential Test. J. V. Breakwell, North 
American Aviation, Inc. Introduced by T. E. Harris. 
An Optimum Decision Procedure for Ranking Means of Normal Populations. (By title.) 
K. C. Seal, University of North Carolina. 
On the Central Limit Theorem for d, Dependent Variables. (By title.) P. H. Diananda, 
University of North Carolina and University of Malaya. 

) Estimation of a Selection Function. (Preliminary report.) Douglas G. Chapman, 
University of Washington. 
On the Power of a Distribution-free One-sided Test of Fit Against Stochastically Com- 
parable Alternatives. (Preliminary report.) Z. W. Birnbaum and Ernest M. Scheuer, 
University of Washington. 


SATURDAY, JUNE 19, 1954 


9:00 A.M. Social and Medical Applications. 


Chairman: Dan Teichroew, Institute for Numerical Analysis, NBS. 

(1) Occupations and Cigarette Smoking as Factors in Lung Cancer. Dr. Lester Breslow, 
Bureau of Chronic Diseases, California State Department of Health. 

(2) A Study of the Onset of Mental Disease from Admission Data. A. W. Marshall, The 
RAND Corporation. 


11:00 A.M. Special Invited Address. 


The Problem of Establishing Cause and Effect Relationships in the Etiology of Chronic 
Diseases. Dr. E. Cuyler Hammond, Yale University and American Cancer Society. 


1:00 P.M. Industrial and Engineering Application. 


Joint session with the American Society for Quality Control. 

Chairman: John Howell, Los Angeles City College. 

(1) The Uses of Probability and Statistics in Theory of Guided Missiles. Robert Muchmore, 
Ramo-Wooldridge Corporation. 

(2) A Statistical Methol of Determining Relationsh’ps between Test Specification Limits 
and Performance Specification Limits. Berl D. Levenson, Hughes Aircraft Company. 

(3) A Method of Specification, Testing, and Evaluation of Missile Systems. E. J. Althaus, 
S. C. Morrison, and W. R. Tate, Hughes Aircraft Company. 

(4) Discovery Sampling. James R. Crawford, Lockheed Aircraft Corporation. 

(5) Evaluation of Quality through Demerit Rating System. Harry G. Romig, International 
Telemeter Corporation. 

(6) On Structural Fatigue under Random Loading. John W. Miles, Department of Engineer- 
ing, UCLA, and Douglas Aircraft Company. 


T. E. Harris 
Program Chairman 





630 PUBLICATIONS RECEIVED 


Rutgers University Honorary Professorship 
The first Honorary Professorship in statistical Quality Control awarded by 
Rutgers University on August 11, 1954 was given Dr. Walter A. Shewhart, pio- 
neer in statistica) quality control techniques and Research Engineer for the Bell 
Telephone Laboratories. 


PUBLICATIONS RECEIVED 


Wo tp, Herman, A Study in the Analysis of Stationary Time Series, 2nd ed., Almqvist and 
Wiksell, Stockholm, 1954, viii + 236 pp. 28 kr. 

Revista de la Facultad de Ciencias Economicas, Vol. VI, Nos. 51-52, Ministerio de Educacion, 
Universidad de Buenos Aires, 1953, 386 pp. 

WiuuiaMs, J. D., The Compleat Strategyst, McGraw-Hill Book Co., Inc., New York, 1954, 
xiii + 234 pp., $4.75. 

Gubrarp, H. W. v., Untersuchungen zur inneren Verkehrslage grosser Stadtkreise, Vol. 25, 
Droste-Verlag, Dusseldorf, 64 pp. 

LIgEBLEIN, JuLius, A New Method of Analyzing Extreme-Value Data, National Advisory 
Committee for Aeronautics, Technical Note 3053, National Bureau of Standards, 
Washington, January 1954, 88 pp 

Studi De Economia E Statistica, Ser. I, Vol. I, 1951, Universita di Catania Anno Accademico 
1950-51, 302 pp. 





ESTADISTICA 


Journal of the Inter American Statistical Institute 


Vol. XII, No. 44 September 1954 
Contents 


Aspectos Metodolégicos del Censo Experimental de Poblacién, Viviendas y Enferme- 
dades en Huacho, Perf, 1952 Josern A. CAVANAUGH AND CARLOS A. URIARTE 
\gricultural Sample Survey of the Province of Pichincha, Ecuador, 1952 
RAYMOND J. JESSEN 
Apreciacién de Cuatro Reuniones Nacionales sobre Coordinacién Estadistica 
Omar DeENGo O. 
Encuesta Agropecuaria de Nicaragua, 1952 ENRIQUE LANzas B. 
Sobre la Ensefianza de la Estadistica en las Facultades de Economfa 
ENRIQUE CANSADO 
Uma Experiencia na Formacao de Estatisticos LourIvaL CAMARA 
Planeamiento y Alcances Técnicos del Experimento Venezolano sobre un Sistema de 
Estadisticas Vitales Rogue Garcia Frfas 


Institute Affairs Statistical News Publications 


Published quarterly. Annual subscription price $3.00 (U.S.) 
Inter American Statistical Institute, Pan American Union, Washington 6, D.C., U. S. A. 





JOURNAL OF THE 
AMERICAN STATISTICAL ASSOCIATION a 


1108 16th St., N.W. Washington 6, D. C. VOL. 49 NO. 266 
Statistical Methods for Poisson Processes and Exponential Populations 
ALLAN BIRNBAUM 
A New Type of Control Chart Limits for Means, Ranges, and Sequential Runs 
H. WEILER 
Technical Aspects of Transportation Flow Data R. Tynes Situ 
Industrial Classes in the United States, 1870 to 1950 TILLMAN H. Socce 
Measurement for Economic Models STANLEY LEBERGOTI 


Applications of the Circular Normal Distribution “ E. J. GuMBEL 


Response Errors in the Collection of Wage Statistics by Mail Questionnaire 
, . . . 
SAMUEL E. COHEN AXD BENJAMIN LivsTEIN 


STATISTICAL REVIEWS 


THE AMERICAN STATISTICAL ASSOCIATION INVITES 
AS MEMBERS ALL PERSONS INTERESTED IN: 

1. Development of new theory and method 

2. Improvement of basic statistical data 

3. Application of statistical methods to practical problems 





ECONOMETRICA 


Journal of the Econometric Society 
Contents of Vol. 22, No. 3 - July, 1954 


KENNETH J. ARROW AND GERARD DEBREU Existence of an Equilibrium for a Com 
petitive Economy. 
M. J. FARRELL An Application of Activity Analysis to the Theory of the Firm. 
D. G. CHAMPERNOWNE A Note on Mr. Farrell’s Model. 
S. G. ALLEN Inventory Fluctuations in Flaxseed and Linseed Oil, 1926-1939. 
R. W. CLOWER AND D. W. Busnaw.. Price Determination in a Stock-Flow Economy. 
W. J. Corxett.. Effects on Demand of Changes in the Distribution of Income: A Comment. 
Kari Borcu. .Effects on Demand of Changes in the Distribution of Income: A Reply. 
Y. K. Wonc Quasi-Inverses Associated with Minkowski-Leontief Matrices. 
J. J. PoLak ANp Ta-Cuunc Liv Stability of the Exchange Rate Mechanism ina Multi- 
Country System 
Book Reviews, Announcements and Notes. 


Published Quarterly Subscription rates available on request 
The Econometric Society is an international society for the advancement of economic theory in its 
relation to statistics and mathematics 
Subscriptions to Zeonometrica and inquiries about the work of the Society and the procedure in applying 


for membership should be addressed to Rossen L. Cardwell, Secretary, The Econometric Society, The 
University of Chicago, Chicago 37, Illinois, U. 8. A. 





BIOMETRIKA 


Volume 41 Contents Parts 1 and 2, June 1954 
Statistical Treatment of Censored Data. I. Fundamental Formulae. F. N. DAVID and N. L. JOHNSON 
On the Moments of Order Statistics in Samples from Normal Populations. H. RUBEN. On the Super- 
position of Renewal Processes. D. R. COX and W. L. SMITH. A Distribution-free k-Sample test against 
Ordered Alternatives. A. R. JONGKHEERE. The Distribution Theory of Two Estimates for Standard 
Deviation Based on Second Variate Differences. A. R. KAMAT. A Bivariate Generalization of Student’s 
t-Distribution, with Tables for Certain Special Cases. C.W. DUNNETT and M. SOBEL. A Two-Sample 
Multiple Decision Procedure for Ranking Means of Normal Populations with a Common Unknown Variance. 
R. E. BECHHOFER, C. W. DUNNETT and M. SOBEL. Two-stage Procedures for Estimating the differ- 
ence between means. 8. G. GHURYE and H. ROBBINS. The Use of the Hankel Transform in Statis- 
tics. I. General Theory and Examples. R. D. LORD. Inequalities for the Normal Integral Including a 
New Continued Fraction. L. R. SHENTON. A Contidence Region for the Solution of a Set of Simul- 
taneous Equations with an Application to Experimental Design. G.E. P. BOX and J.8S. HUNTER. The 
Statistical Treatment of Mean Deviation. J. H. CADWELL. Tests of Linear Hypotheses in Univariate 
and Multivariate Analysis When the Rat‘os of the Population Variances are Unknown. G.8S. JAMES. On 
Nahordnung and Fernordnung in Samples of Literary Texts. W.FUCKS New Techniques for the Analy- 
sis of Absenteeism Data. A. G. ARBOUS and H. 8. SICHEL. Simplified Decision Functions. G. A 
BARNARD. A Note on the Consistency and Maxima of the Roots of Likelihood Equations. K. 
8. CHANDA. Continuous Inspection Schemes. E.8. PAGE. Grouping Methods in the Fitting of Poly- 
nomials to Equally-spaced Observations. P. G. GUEST 


The epewigten price. payable in advance, is 458. inland, 54s. export (per volume including postage) Cheques 


should be drawn to Biometrika and sent to “The Secretary, Biometrika Office, Department of Statistics, 
University College. London, W.C. 1."" All foreign cheques must be in sterling and drawn on a bank 
having a London agency 





MATHEMATICAL REVIEWS 


A journal containing reviews of the mathematical liter- 
ature of the world, with full subject and author indices 


Publication of this journal is sponsored by the American Mathe- 
matical Society, Mathematical Association of America, Institute of 
Mathematical Statistics, London Mathematical Society, Edinburgh 
Mathematical Society, Union Matematica Argentina, and others. 


Subscriptions accepted to cover the calendar year only. 
Issues appear monthly except July. $20.00 per year. 


Send subscription order or request for sample copy to 


AMERICAN MATHEMATICAL SOCIETY 
80 Waterman Street, Providence 6, Rhode Island 


JOURNAL OF THE 
ROYAL STATISTICAL SOCIETY 


Series B (Methodological) 

Vol. XVI, No. 1, 1954 
F. G. Foster AND A. STUART Distribution-free Tests in Time-Series Based on the 
Breaking of Records (with Discussion) 
J. M. Hammerstey anp K. W. Morton Symposium on Monte Carlo Methods: 
Poor Man’s Monte Carlo, 
K. D. Tocner......The Application of Automatic Computers to Sampling Experiments 
(with Discussion) 
J. M. Hammerstey AnD K. W. Morton.... .....Transposed Branching Processes 
N. T. J. BAtLey On Queueing Processes with Bulk Service 
J. H. BENNETT The Distribution of Heterogeneity upon Inbreeding 
CHARLOTTE BANKS ; The Factorial Analysis of Crop Productivity 
P. A. P. MoRAN Sume Experiments on the Prediction of Sunspot Numbers 
L. MANDEL Grading with a Gauge Subject to Random Output Fluctuations 
E. S. PAGE , - Control Charts for the Mean of a Normal Population 
E. S. PAGE An Improvement to Wald’s Approximation for Some Properties of 
Sequential Tests 
DD. PACGON. ..... «00+: Fie as The Errors of Lattice Sampling 





The Royal Statistical Society, 21, Bentinck Street, London, W. 1 


ERENT 








SKANDINAVISK 
AKTUARIETIDSKRIFT 


1953 - Parts 3 - 4 
Contents 


erik SPARRE ANDERSEN On Sums of Symmetrically Dependent Random 
Variables 
D. R. Cox anp Wat rer L. Smitru A Direct Proof of a Fundamental Theorem 
of Renewal Theory 
GustaFr BorENIUS On the Statistical Distribution of Mine Explosions 
Benot ULIN An Extremal Problem in Mathematical Statistics 
BrirGER MEIDELL.. Randbemerkungen zum Landréschen Maximum 
Ur GRENANDER AND Murray ROSENBLAT?® Comments on Statistical Spectral 
Analysis 
GUNNAR BENKTANDER On the Variation of the Risk Premium with the Dimen 
sions of the House within Fire Insurance 
Litteraturanmalningar 

i-ksamen i Forsikringsvidenskab og Statistik ved Kgbenhavns Universitet 

Tentamensproblem och évningsuppgifter i forsikringsmatematik vid Stockholms 


¢ Hogskola fér betyget AB 
Oversikt av utlindska aktuarietidskrifter 


Annual subscription: $5.00 per year 
Inquiries and orders may be addressed to the Editor, 
GRANHALLSVAGEN 35, STOCKSUND, SWEDEN 


SANKHYA 


The Indian Journal of Statistics 
Edited by P. C. Mahalanobis 
Vol. 13, Part 4, 1954 
H. B. Mann A Theory of Estimation for the Fundamental Random Process 


and the Ornstein Uhlenbeck Process 
Henry B. Mann anv Pau B. Mornapa On the Efficiency of the Least 


Square Estimates of Parameters in the Ornstein Uhlenbeck Process 
D. Basu anp R. G. Lana On Some Characterizations of the Normal 


Distribution 
D. Basu On the Optimum Character of Some Estimators Used in Multistage 


Sampling Problems 
8S. Ruswton On the Confluent Hypergeometric Function M(a,y,z 
S. Rusuron anp E.. D. Lane Tables of the Confluent Hypergeometric 


: Function 
M. MuKHERJEE Estimation of National Consumption of the United 


Kingdom from Family Budget Studies 
K. K. Matuen anp 8. J. Pott An Adjustment for the Effect of Changing 
Birth Rates on Infant Mortality Rates 
RANJAN K. Som Seasonality in the Incidence of Strikes in the Bombay 


Textile Industry 
Indian Statistical Institute: Twenty-first Annual Report: 1952-53 
ANNUAL SuBscription: 30 rupees ($10.00), 10 rupees ($3.50) per issue. 
Back Numpers: 45 rupees ($15.00) per volume; 12/8 rupees ($4.50) per issue. 
Subscriptions and orders for back numbers should be sent to 
STATISTICAL PUBLISHING SOCIETY 
204/1 Barrackpore Trunk Road Calcutta 35, India 





